Ethos is a well-funded Series A startup focused on transforming training to align with strategic business outcomes, serving over 150 enterprise customers. They are looking for a Senior/Staff DevOps Engineer to lead the deployment and operationalization of their SaaS products, enhance DevOps practices, and shape platform engineering strategy, particularly with AI tooling and complex data pipelines.

Responsibilities:

Design & Operate the Platform: Architect, implement, and run secure, scalable, multi-tenant infrastructure (infra as code, immutable artifacts, GitOps)
AI-Augmented Operations & Platform Work: Use AI coding and agentic tools (Claude Code, Cursor, Copilot, MCP-based ops agents) for IaC authoring, pipeline development, log/trace analysis, postmortem drafting, and toil reduction; build and improve agentic workflows for the team
CI/CD & Release Engineering: Build and harden pipelines (build, test, scan, sign, promote, deploy) for multi-environment delivery—including disconnected/air-gapped workflows
Observability & Reliability: Establish SLOs; instrument systems for metrics/logs/traces; drive incident response and postmortems; reduce MTTR and change failure rate
Security & Compliance by Design: Integrate supply-chain security (SBOMs, signing, provenance), secrets management, and baseline hardening (CIS/STIG-aligned)
Cost & Performance: Optimize infrastructure spend and performance (capacity planning, autoscaling, right-sizing, storage/egress strategies)
Technical Leadership: Lead design reviews, author RFCs, mentor engineers, and raise the quality bar for platform changes
Gov/Constrained Deployments: Support IL-4/IL-5-aligned patterns, RMF documentation support, and offline artifact promotion processes where needed
(Staff) Strategy & Standards: Define platform roadmaps, establish consistent deployment and infrastructure patterns, and guide cross-team adoption of best practices

Requirements:

5+ years building and operating cloud platforms; 3+ years deploying SaaS in production
Strong with Terraform, Helm/Kustomize, and containers (Docker, Kubernetes)
Deep AWS experience (e.g., VPC, EKS, EC2, S3, RDS, ECR, IAM/KMS, Route 53; CloudFront desirable)
CI/CD expertise (e.g., GitHub Actions, CircleCI, or Argo Workflows) and GitOps (Argo CD or Flux)
Observability across metrics, logs, and traces (e.g., Prometheus/Grafana, OpenTelemetry, ELK)
Proven track record in IaC, scalable system design, and quality tooling (automated tests, canaries/blue-green, feature flags)
Excellent communication; comfortable partnering with Product, Security, and Customer teams
Thrives in a startup environment—ownership, autonomy, and pragmatic delivery
Active, fluent use of AI development/operations tools as part of your daily workflow
Secret Clearance or eligibility and willingness to obtain one
Supply-chain security (SBOMs, SLSA concepts, image signing, provenance) and vulnerability management (e.g., Trivy/Grype, Snyk; Chainguard experience a plus)
Experience identifying/mitigating CVEs and setting policy thresholds
Background with DoD/regulated customers; familiarity with IL-4/IL-5, Platform One patterns, and RMF documentation workflows
Knowledge of STIG/CIS hardening, air-gapped architectures, and offline update mechanisms
Experience operating AI/ML workloads in production (GPU scheduling, model artifact management, inference serving, vector DBs, queuing/streaming) or building agentic ops workflows / MCP-based integrations (alert triage, runbook automation, IaC review agents)

Senior/Staff DevOps Engineer

Key skills

About this role

Responsibilities:

Requirements: