Sesame is a company focused on creating lifelike computers that can interact naturally with humans. They are seeking an ML Model Serving Engineer to enhance their serving layer for various models, collaborating with infrastructure and training engineers to develop a reliable and efficient serving system.

Responsibilities:

Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models
Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving
Work with the training team to identify opportunities to produce faster models without sacrificing quality
Use techniques like in-flight batching, caching, and custom kernels to speed up inference
Find ways to reduce model initialization times without sacrificing quality

ML Model Serving Engineer

Key skills

About this role

Responsibilities: