RadixArk is an infrastructure-first company focused on democratizing frontier-level AI infrastructure. They are seeking a Performance Engineer to enhance performance across their production systems, specifically in LLM inference and training workloads.
Responsibilities:
- Analyze and improve performance across SGLang, Miles, and RadixArk production deployments
- Benchmark LLM inference and training workloads across GPUs, TPUs, and cloud environments
- Optimize latency, throughput, memory usage, batching, scheduling, routing, and GPU utilization
- Investigate performance regressions in real customer environments
- Work closely with kernel, runtime, distributed systems, and product engineers
- Build internal tooling for profiling, tracing, benchmarking, and regression detection
- Translate customer workload characteristics into concrete performance tuning strategies
- Help define performance metrics that matter commercially, including cost-per-token and serving efficiency
- Partner with customers and cloud partners on deep technical evaluations
- Contribute performance insights back to open-source SGLang and Miles