Nuance Labs is building the next generation of emotionally expressive, real-time AI. This role focuses on developing the infrastructure that supports their AI platform, ensuring reliable and low-latency performance for real-time video applications.
Responsibilities:
- Own Inference Infrastructure: Build and maintain the serving stack for multimodal AI workloads. Optimize for latency, throughput, and cost using batching strategies, autoscaling, and intelligent resource allocation
- Real-Time Video Streaming: Architect systems to handle long-lived WebRTC connections with unpredictable client behavior, ensuring smooth video and audio delivery at scale
- Orchestrate Data Workflows: Build robust pipelines for offline processing, evaluation, and training using orchestration frameworks like Dagster or Ray. Manage petabyte-scale video storage and network requirements
- GPU Cluster Management: Configure and maintain GPU clusters using Kubernetes and Terraform. Implement monitoring, autoscaling based on custom metrics, and cost optimization strategies
- Developer Tooling: Build CI/CD, evaluation, and versioning systems that enable safe, zero-downtime model deployments and rapid iteration cycles