Lemurian Labs is reimagining the foundations of computing to make AI accessible to everyone. They are looking for a Senior ML Performance Engineer to architect and lead their Performance Testing Platform, focusing on measuring and optimizing the performance of large language models on modern GPU architectures.
Responsibilities:
- Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
- Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
- Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
- Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
- Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
- Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
- Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
- Document best practices for performance testing and optimization of ML workloads on GPU hardware