Reddit is a community-driven platform known for its open conversations and vast user base. They are seeking a Staff Machine Learning Engineer to lead the development of a large-scale ML Inference Platform that supports various teams and enhances user experiences through advanced AI systems.

Responsibilities:

Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting Millions of QPS
Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale
Rapidly develop prototypes and develop a high-performance feature hydration and processing system as a part of the inference stack - including routing, caching, and batching
Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models
Strong understanding of real-time ML observability to track feature/model performance
Experience working with LLM serving online at scale
Built an E2E inference performance benchmarking framework
Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases

Requirements:

7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles
Experience operating orchestration systems such as Kubernetes at scale
Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc
Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
Strong focus on scalability, reliability, performance, and ease of use
Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems
Strong proficiency in Python and deep experience with modern AI/ML frameworks (Triton, Dynamo, vLLM, Pytorch)
Strong understanding of real-time ML observability to track feature/model performance
Experience working with LLM serving online at scale
Built an E2E inference performance benchmarking framework
Deep Understanding of multi-cluster compute environment and network topology that is specific to ML inference use cases

Staff Machine Learning Engineer, AI Serving

Key skills

About this role

Responsibilities:

Requirements: