Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs. The Senior Software Engineer, LLM Performance plays a crucial role in efficiently scheduling, executing, and managing AI workloads on distributed compute systems, focusing on optimizing performance and sustainability for generative AI applications.
Responsibilities:
- Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments
- Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers)
- Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM
- Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality