Thinking Machines Lab is dedicated to empowering humanity through advancing collaborative general intelligence. They are seeking an infrastructure research engineer to design, optimize, and scale systems that enhance the performance of large AI models, ensuring smooth operations for experiments and deployments.

Responsibilities:

Work alongside researchers and engineers to bring cutting-edge AI models into production
Collaborate with research teams to enable high-performance inference for novel architectures
Design and implement new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency
Optimize our codebase and compute fleet (e.g., GPUs) to fully utilize hardware FLOPs, bandwidth, and memory
Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving
Establish standards for reliability, observability, and reproducibility across the inference stack
Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure

Research Engineer, Infrastructure, Inference

Key skills

About this role

Responsibilities: