RadixArk is an infrastructure-first company focused on democratizing frontier-level AI infrastructure. They are seeking a Performance Engineer to enhance performance across their production systems, specifically in LLM inference and training workloads.

Responsibilities:

Analyze and improve performance across SGLang, Miles, and RadixArk production deployments
Benchmark LLM inference and training workloads across GPUs, TPUs, and cloud environments
Optimize latency, throughput, memory usage, batching, scheduling, routing, and GPU utilization
Investigate performance regressions in real customer environments
Work closely with kernel, runtime, distributed systems, and product engineers
Build internal tooling for profiling, tracing, benchmarking, and regression detection
Translate customer workload characteristics into concrete performance tuning strategies
Help define performance metrics that matter commercially, including cost-per-token and serving efficiency
Partner with customers and cloud partners on deep technical evaluations
Contribute performance insights back to open-source SGLang and Miles

Performance Engineer

Key skills

About this role

Responsibilities: