DataDirect Networks (DDN) is a global leader in AI and multi-cloud data management at scale, seeking a highly experienced Senior Staff Engineer specializing in AI Data Path & Storage. The role involves leading the development and integration of advanced storage systems with AI inference pipelines, focusing on high-performance data movement and system optimization.

Responsibilities:

Lead the design and implementation of high-performance data movement pipelines using NVIDIA NIXL across GPU, CPU, and storage tiers
Architect and drive integration of DDN Infinia with GPU-accelerated inference platforms for large-scale, real-time AI workloads
Own end-to-end optimization of I/O paths between GPU memory and storage using technologies such as NVIDIA GPUDirect Storage, RDMA, and NVMe-over-Fabrics
Define and implement multi-tier storage architectures (NVMe, SSD, object storage) optimized for inference latency, throughput, and scalability
Lead development of advanced KV cache management strategies, including offloading, prefetching, and persistence across distributed storage layers
Partner with AI/ML engineering teams to optimize inference performance in frameworks such as PyTorch and TensorFlow
Establish benchmarking frameworks and lead performance tuning efforts for storage and data movement in production inference environments
Diagnose and resolve complex system bottlenecks across storage, networking, and GPU subsystems
Influence architecture decisions for distributed inference systems, ensuring scalability, resilience, and efficient data locality
Drive engineering excellence through best practices in observability, performance monitoring, automation, and reliability engineering
Mentor junior engineers and provide technical leadership across cross-functional teams

Requirements:

Bachelor's or Master's degree in Computer Science, Engineering, or a related field
12+ years of experience in storage systems, distributed systems, or performance engineering
Proven track record of architecting and delivering large-scale, high-performance infrastructure systems
Deep expertise in distributed storage architectures (object storage, scalable file systems, or cloud-native storage platforms)
Strong understanding of Linux I/O stack, filesystem internals, and storage protocols
Extensive hands-on experience with NVMe, SSD optimization, and high-performance storage environments
Strong experience with RDMA, InfiniBand, or other high-speed data transfer technologies
Solid understanding of GPU computing concepts and CPU–GPU data movement patterns
Proficiency in Python and/or C/C++, with advanced debugging, profiling, and performance tuning skills
Demonstrated ability to optimize latency-sensitive, high-throughput production systems
Hands-on experience with NVIDIA NIXL or similar data movement frameworks
Experience with GPU-aware storage pipelines and GPUDirect Storage
Strong understanding of AI inference systems, LLM serving architectures, and KV cache optimization
Experience with Retrieval-Augmented Generation (RAG) pipelines and open vector search ecosystems
Background in high-performance computing (HPC) or hyperscale distributed environments
Expertise in caching strategies, memory tiering, and data locality optimization
Experience designing disaggregated compute and storage architectures

Senior Staff Engineer - AI Data Path

Key skills

About this role

Responsibilities:

Requirements: