Nuance Labs is a company focused on building advanced AI avatars that can interact in real-time with emotional intelligence. The Member of Technical Staff will be responsible for developing and scaling reinforcement learning and post-training methods for their omni models, ensuring high-quality interactive behavior and system efficiency.
Responsibilities:
- Build Nuance’s RL/post-training stack from 0→1: rollout generation, policy optimization, reward/reference model serving, data feedback loops, evaluation, checkpointing, observability, and debugging
- Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement
- Design the systems abstractions that connect research ideas to production-scale RL runs: trainers, rollout workers, reward models, evaluators, data queues, experience buffers, and checkpoint promotion
- Build evaluation and feedback loops for omni behavior: turn-taking, interruption, timing, emotional response, audiovisual coherence, instruction following, and real-time interaction quality
- Optimize the end-to-end post-training loop across rollout throughput, serving latency, GPU utilization, policy update efficiency, queueing, checkpoint overhead, and research iteration speed
- Evolve the platform as algorithms, model architectures, reward definitions, data sources, and evaluation methods change