Nuance Labs is a company focused on building advanced AI avatars that can interact in real-time with emotional intelligence. The Member of Technical Staff will be responsible for developing and scaling reinforcement learning and post-training methods for their omni models, ensuring high-quality interactive behavior and system efficiency.

Responsibilities:

Build Nuance’s RL/post-training stack from 0→1: rollout generation, policy optimization, reward/reference model serving, data feedback loops, evaluation, checkpointing, observability, and debugging
Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement
Design the systems abstractions that connect research ideas to production-scale RL runs: trainers, rollout workers, reward models, evaluators, data queues, experience buffers, and checkpoint promotion
Build evaluation and feedback loops for omni behavior: turn-taking, interruption, timing, emotional response, audiovisual coherence, instruction following, and real-time interaction quality
Optimize the end-to-end post-training loop across rollout throughput, serving latency, GPU utilization, policy update efficiency, queueing, checkpoint overhead, and research iteration speed
Evolve the platform as algorithms, model architectures, reward definitions, data sources, and evaluation methods change

Member of Technical Staff — RL Research

Key skills

About this role

Responsibilities: