About this role

NVIDIA is a leading technology company specializing in AI and deep learning. They are seeking an experienced software professional to work on innovative projects that enhance the performance of deep learning systems using CUDA, focusing on model optimization and custom kernel development.

Responsibilities:

Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping
Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments
Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads
Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines
Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross-node network communication efficiency and programmability
Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning
Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products

Senior Software Engineer, CUDA Deep Learning Systems

Key skills

About this role

Responsibilities: