Sesame is a company focused on creating lifelike computers that can interact naturally with humans. They are seeking an ML Model Serving Engineer to enhance their serving layer for various models, collaborating with infrastructure and training engineers to develop a reliable and efficient serving system.
Responsibilities:
- Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models
- Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category
- Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving
- Work with the training team to identify opportunities to produce faster models without sacrificing quality
- Use techniques like in-flight batching, caching, and custom kernels to speed up inference
- Find ways to reduce model initialization times without sacrificing quality