RadixArk is an infrastructure-first company focused on democratizing frontier-level AI infrastructure. They are seeking a Member of Technical Staff — Inference to design and build large-scale inference systems for AI models, optimizing performance and collaborating with various teams on performance-critical problems.
Responsibilities:
- Design and build large-scale inference systems for frontier AI models
- Optimize latency, throughput, and GPU utilization in production inference
- Develop and improve model serving architectures and runtimes
- Work on batching, scheduling, and memory management strategies
- Collaborate with kernel, compiler, and systems teams on performance optimization
- Debug performance bottlenecks across the stack
- Drive reliability and scalability of inference infrastructure
- Build tooling for observability, profiling, and performance analysis
- Contribute to long-term inference architecture and strategy