NBCUniversal is one of the world's leading media and entertainment companies. They are seeking a Principal DevOps Engineer to architect and evolve the platform that powers NBC’s broadcast production environments, focusing on designing a Kubernetes-native platform and automating cloud infrastructure at enterprise scale.
Responsibilities:
- Architect a Kubernetes-native platform that models broadcast infrastructure as custom resources
- Lead the technical strategy leveraging Crossplane compositions and custom Go functions to automate provisioning across multi-account AWS environments and on-prem control rooms
- Design, build, and maintain production-grade Kubernetes operators, controllers, and internal platform APIs in Go
- Actively develop custom Crossplane providers to deeply integrate external enterprise platforms (such as NRCS, Venafi, and Infoblox) into our control plane, managing resource lifecycles and approval workflows
- Lead the design of cloud networking, DNS strategies, and cross-account connectivity across hybrid environments, automating VPC topology and dynamic network routing
- Partner closely with broadcast systems engineers, system integrators, and external vendors to bridge the gap between broadcast hardware and automated infrastructure
- Lead efforts to 'Puppet-ize' bare-metal compute configurations and integrate proprietary vendor solutions into our configuration-as-code ecosystem
- Serve as a technical authority for the team
- Write RFCs, drive architectural decisions, mentor engineers, and establish high-confidence CI/CD pipelines, testing strategies, and GitHub Actions automation
- Own the platform's authorization model, designing hierarchical RBAC systems, resource identifier schemes, and identity integrations that enforce fine-grained access control
- Drive GitOps-based continuous delivery (Flux, Kustomize, Helm) and manage configuration-as-code for compute fleets using Puppet
- Ensure deep operational visibility by designing comprehensive observability and alerting stacks
- Oversee the integration of remote desktop/VDI connectivity solutions, focusing on session authentication, credential management, and gateway routing
Requirements:
- 10+ years of experience designing, building, and operating production infrastructure and cloud-native platforms at enterprise scale
- Strong proficiency in Go (systems-level programming, API servers) and deep experience building Kubernetes controllers/operators using patterns like controller-runtime and kubebuilder
- Expert-level knowledge of the Kubernetes ecosystem, including CRD/XRD generation, operators, informers, admission webhooks, and RBAC
- Deep production experience with Crossplane, including composite resources, composition functions, and specifically developing custom Crossplane providers in Go to integrate external enterprise platforms
- Extensive production experience with AWS multi-account architectures, cross-account networking patterns, and identity federation. Requires depth across EKS, EC2, VPC, IAM, STS, SSM, Secrets Manager, Route 53, and S3
- Production experience with GitOps tooling, specifically Flux (HelmRelease, Kustomization) or ArgoCD for continuous delivery on Kubernetes
- Hands-on experience with Puppet, including module development, PuppetDB, Hiera, and r10k
- Experience designing REST APIs with middleware patterns and modern authentication (OAuth/JWT). Keen eye for information security, including cross-account IAM trust chains, least-privilege policies, JWT token lifecycles, and secrets abstraction
- Strong background in designing telemetry platforms using Grafana, Prometheus/Mimir, Loki, OpenTelemetry, and metrics collection agents (Alloy, Prometheus Node Exporter)
- Working knowledge of PostgreSQL, SQLite or similar relational databases, encompassing schema design, migrations, and query optimization
- Excellent problem-solving skills with a proven ability to present architectural decisions to executives, engage with vendors, and write clear technical documentation
- Familiarity with broadcast/media production workflows and the strict operational constraints of live production environments
- Experience with the Crossplane function SDK for building custom composition functions in Go, and operating in Kubernetes disaster recovery situations (Velero cluster restoration, backups)
- Familiarity with VDI Solutions (NICE DCV, Leostream, PCoIP, etc), machine identity workflows, and PKI certificate management (Venafi or similar)
- Experience with hybrid DNS architectures (Infoblox), software-defined networking (VPC peering, Transit Gateway, Direct Connect, CloudWan), and Envoy Gateway / Gateway API
- Familiarity with advanced testing frameworks (k6, KUTTL, etc), SOPS for encrypted GitOps configurations, and local development workflows (Air, kind/colima)
- Ability to script routine tasks in Bash and PowerShell
- Active contributions to open-source projects, particularly within the CNCF / Kubernetes ecosystem