Motional is a driverless technology company focused on making autonomous vehicles a safe and reliable reality. They are seeking a Machine Learning Systems Engineer to join their ML Acceleration team, responsible for optimizing systems that enable large-scale model training with an emphasis on speed, cost, reliability, and throughput.
Responsibilities:
- Utilize profiling tools (e.g., Nsight, PyTorch Profiler) to identify bottlenecks in data loading, gradient computation, and communication. Implement optimizations like kernel fusion, sharding, and tiling to improve step time
- Optimize distributed training pipelines using frameworks such as PyTorch Distributed
- Design and maintain high-performance GPU kernels in Triton or CUDA for state-of-the-art ML workloads
- Optimize robust data loading pipelines that maximize training throughput
Requirements:
- Bachelor's, Master's degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline
- Strong proficiency in Python
- Extensive hands-on experience with PyTorch
- Experience optimizing machine learning model execution during training and inference, alongside a strong understanding of fundamental machine learning concepts, architectures, and processes
- Exceptional analytical and problem-solving skills, with a bias for action and a data-driven approach to technical challenges