NVIDIA is a leader in visual computing and AI technology, dedicated to advancing energy simulation and AI workflows. The Developer Technology Engineer will focus on optimizing CUDA performance for energy-related workloads and collaborate with engineering teams to enhance GPU performance.
Responsibilities:
- Profile, analyze, and optimize GPU-accelerated applications with emphasis on CUDA kernels, memory movement, concurrency, and end-to-end throughput
- Drive performance improvements across the stack:
- CUDA C++ kernel optimization, launch configuration, memory hierarchy, streams/events
- GPU libraries (as applicable): cuBLAS, cuFFT, cuSPARSE, cuSOLVER, NCCL
- Multi-GPU and multi-node scaling using MPI + NCCL, CPU/GPU overlap, communication patterns
- Build reproducible benchmarks, performance reports, and tuning recommendations (before/after, methodology, scaling curves)
- Develop and maintain reference implementations, examples, and/or patches to customer code to enable performance and portability
- Support customer engagements (POCs to production), including debugging correctness/performance issues and advising on best practices for deployment (containers, schedulers, clusters)
- Collaborate with internal teams to file actionable issues, validate fixes, and influence roadmap based on real customer requirements in Energy
- Build internal libraries and reusable code that would lead to future NVIDIA products
Requirements:
- BS/MS (or equivalent experience) in CS/CE/EE/Physics/Applied Math or related field
- Strong programming skills in C/C++ and Python on Linux
- Hands-on experience with CUDA programming and GPU performance optimization concepts
- Experience profiling and debugging performance using tools such as NVIDIA Nsight Systems / Nsight Compute (or equivalent)
- Understanding of parallel computing and performance fundamentals (vectorization, threading, NUMA, memory bandwidth/latency)
- Ability to communicate technical findings clearly to both engineers and non-engineers
- 5+ years relevant experience in GPU/HPC optimization; strong track record of delivered speedups and scaling improvements
- Leads performance reviews with customer stakeholders; creates reusable playbooks/reference designs
- HPC experience with MPI, distributed systems, and multi-node performance tuning
- Energy/HPC domain exposure: Seismic processing pipelines, RTM/FWI-style patterns, FFT/stencil/linear algebra heavy codes
- Reservoir simulation (sparse/iterative solvers), preconditioning, domain decomposition
- Power grid simulation / transient stability / optimization workflows
- Experience with CI/perf regression testing, containerized workflows (Docker/Apptainer), and schedulers (Slurm)
- Familiarity with AI workflows used alongside simulation (data prep, training/inference integration, pipeline performance)