NVIDIA is a leader in AI and high-performance computing, and they are seeking a Principal Security Data Engineer to enhance their Infrastructure Security Engineering team. This role involves building the data backbone of NVIDIA's security control plane, focusing on designing and operating data pipelines, lakes, and analytics for security telemetry.
Responsibilities:
- Design, build, and operate the ingestion and transformation pipelines that collect security telemetry and asset inventory from dozens of heterogeneous sources, and normalize them into one canonical model
- Architect and run the storage layer. A data lake/lakehouse built on open formats, with the schema flexibility to absorb structured inventory, semi-structured telemetry, and unstructured logs without constant, breaking migrations
- Build the query and analytics layer that powers posture scoring, coverage and drift metrics, freshness monitoring, and multi-source correlation
- Treat the data platform as a high-value target, because it is. The data you store is a map of every host, every gap, and every credential path. You will engineer encryption at rest and in transit, fine-grained RBAC/ABAC, non-repudiable audit logging, data classification, network isolation, and verifiable retention and purge
- Build for stable identity, source attribution, append-only history, and honest coverage. Make a source going quiet a finding, not silence, so that every downstream number comes with a known confidence
- Partner with the security control plane team, the inventory systems, identity and endpoint teams, and broader NVIDIA data and security organizations to define data contracts early, so these systems converge by design
Requirements:
- 15+ years of experience designing, building, and operating production data pipelines, lakes, or lakehouses at high volume and throughput
- Bachelor's degree or equivalent
- A strong software engineering background with the ability to write clean, maintainable, and well-tested code (e.g., Python, Go, Scala, SQL)
- Proven ability to design canonical schemas and data models that span many disparate sources and evolve over time without breaking the consumers that depend on them
- Hands-on experience with the modern data stacks, both streaming and batch processing, object storage, open table formats, and interactive query engines
- You design data systems that are themselves defensible. Access control, encryption, audit, and isolation are first-class concerns in your work
- A track record of making large, messy datasets genuinely useful—serving interactive analysts, dashboards, and downstream services with data they can trust and query at low latency
- Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent experience)
- Experience building SIEM or data-lake detection content, normalizing security logs into common schemas (e.g., OCSF, ECS), or engineering the data layer that feeds correlation and anomaly-detection systems
- Expertise building low-latency, near-real-time pipelines where a correlation is only as fast as its slowest input, and detection is measured in minutes
- Experience working with GPU and hardware telemetry (DCGM, Redfish/BMC, InfiniBand) or fleet-scale observability across hundreds of thousands of devices
- Experience engineering the data and feature layers that feed ML or LLM-based reasoning systems, enabling agents to correlate, predict, and act on trustworthy data