Thinking Machines Lab is dedicated to empowering humanity through advancing collaborative general intelligence. They are seeking an infrastructure research engineer to design, optimize, and scale systems that enhance the performance of large AI models, ensuring smooth operations for experiments and deployments.
Responsibilities:
- Work alongside researchers and engineers to bring cutting-edge AI models into production
- Collaborate with research teams to enable high-performance inference for novel architectures
- Design and implement new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency
- Optimize our codebase and compute fleet (e.g., GPUs) to fully utilize hardware FLOPs, bandwidth, and memory
- Extend orchestration frameworks (e.g., Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving
- Establish standards for reliability, observability, and reproducibility across the inference stack
- Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure