NVIDIA is a leader in visual computing and AI technology, dedicated to advancing energy simulation and AI workflows. The Developer Technology Engineer will focus on optimizing CUDA performance for energy-related workloads and collaborate with engineering teams to enhance GPU performance.

Responsibilities:

Profile, analyze, and optimize GPU-accelerated applications with emphasis on CUDA kernels, memory movement, concurrency, and end-to-end throughput
Drive performance improvements across the stack:
CUDA C++ kernel optimization, launch configuration, memory hierarchy, streams/events
GPU libraries (as applicable): cuBLAS, cuFFT, cuSPARSE, cuSOLVER, NCCL
Multi-GPU and multi-node scaling using MPI + NCCL, CPU/GPU overlap, communication patterns
Build reproducible benchmarks, performance reports, and tuning recommendations (before/after, methodology, scaling curves)
Develop and maintain reference implementations, examples, and/or patches to customer code to enable performance and portability
Support customer engagements (POCs to production), including debugging correctness/performance issues and advising on best practices for deployment (containers, schedulers, clusters)
Collaborate with internal teams to file actionable issues, validate fixes, and influence roadmap based on real customer requirements in Energy
Build internal libraries and reusable code that would lead to future NVIDIA products

Requirements:

BS/MS (or equivalent experience) in CS/CE/EE/Physics/Applied Math or related field
Strong programming skills in C/C++ and Python on Linux
Hands-on experience with CUDA programming and GPU performance optimization concepts
Experience profiling and debugging performance using tools such as NVIDIA Nsight Systems / Nsight Compute (or equivalent)
Understanding of parallel computing and performance fundamentals (vectorization, threading, NUMA, memory bandwidth/latency)
Ability to communicate technical findings clearly to both engineers and non-engineers
5+ years relevant experience in GPU/HPC optimization; strong track record of delivered speedups and scaling improvements
Leads performance reviews with customer stakeholders; creates reusable playbooks/reference designs
HPC experience with MPI, distributed systems, and multi-node performance tuning
Energy/HPC domain exposure: Seismic processing pipelines, RTM/FWI-style patterns, FFT/stencil/linear algebra heavy codes
Reservoir simulation (sparse/iterative solvers), preconditioning, domain decomposition
Power grid simulation / transient stability / optimization workflows
Experience with CI/perf regression testing, containerized workflows (Docker/Apptainer), and schedulers (Slurm)
Familiarity with AI workflows used alongside simulation (data prep, training/inference integration, pipeline performance)

Developer Technology Engineer, Energy

Key skills

About this role

Responsibilities:

Requirements: