NVIDIA is a leading technology company known for its innovative approach to computing and AI. They are seeking a Senior Software Engineer to contribute to the NeMo Platform, focusing on building and improving AI systems through effective evaluation and infrastructure development.

Responsibilities:

Design and implement Python-first APIs, SDK workflows, and plugin interfaces for building, measuring, and improving agents across multiple runtimes and product surfaces
Build reusable systems for observing behavior, measuring progress, detecting regressions, and turning runtime evidence into product decisions
Build systems for ingesting, normalizing, validating, and analyzing agent execution data and evaluation datasets
Partner with research, product, platform, and infrastructure teams to integrate agentic capabilities broadly across NVIDIA agent runtimes and developer workflows
Help turn emerging agent development and improvement techniques into reliable, reusable product capabilities
Improve reliability, observability, debuggability, and performance across NeMoStack services, SDKs, plugins, jobs, and developer workflows
Build strong test coverage across unit, integration, E2E, Docker, and Kubernetes workflows
Drive “speed of light” engineering: fast iteration, high ownership, pragmatic decisions, and performance-minded implementation under production constraints
Provide senior technical leadership through design reviews, code reviews, mentoring, and ownership of ambiguous cross-component problems

Requirements:

BS, MS, or equivalent experience in Computer Science, Computer Engineering, or a related technical field
5+ years of professional software engineering experience building production systems
Excellent Python engineering skills, including API design, typing, testing, debugging, performance analysis, and maintainable software design
Experience designing SDKs, libraries, plugins, CLIs, or other developer-facing interfaces
Experience with distributed systems, cloud-native services, containers, Kubernetes, or job orchestration
Strong understanding of reliability, scalability, security, and performance tradeoffs in production infrastructure
Experience with structured data modeling and validation systems such as Pydantic, typed schemas, event/trace models, or SDK-generated types
Ability to work independently, define technical scope, break down ambiguous problems, and drive work across team boundaries
Clear communication skills and a track record of collaborating with engineering, product, research, or customer-facing teams
Experience building, deploying, and iterating on production agentic AI systems where evaluation was used to measure and improve real product outcomes
Experience designing evaluation workflows for heterogeneous agents, including tool-using agents, RAG agents, workflow agents, coding agents, or long-running autonomous systems
Experience integrating evaluation capabilities across multiple products, runtimes, or internal platforms, especially through Python SDKs, plugins, or shared developer tooling
Strong ability to connect technical evaluation work to business outcomes, product quality, user experience, reliability, or operational efficiency
Experience with enterprise AI systems where measurement, regression testing, observability, governance, and continuous improvement are required for production deployment

Senior Software Engineer, Agentic Systems

Key skills

About this role

Responsibilities:

Requirements: