Cadence is a clinical AI company focused on delivering continuous care for older adults with chronic conditions. The Staff DevOps Engineer will design, operate, and scale the cloud infrastructure that supports care delivery for over 100,000 patients, ensuring reliability, security, and observability.
Responsibilities:
- Own the design and continuous improvement of Cadence's cloud infrastructure, driving reliability, scalability, and secure software delivery across all environments
- Maintain and mature core Kubernetes services, including resilient networking, autoscaling, monitoring, and well-architected patterns for production workloads that support real-time patient monitoring
- Lead Terraform-based infrastructure-as-code practices, including authoring, reviewing, and enforcing standards for AI-generated IaC to ensure correctness and security before deployment
- Define and enforce DevSecOps controls across clusters, including least-privilege IAM, container image scanning, and runtime policy - ensuring infrastructure meets the compliance requirements of a regulated healthcare environment
- Sharpen observability practices using tools like Datadog, improving alerting, incident response, and the feedback loops that keep clinical systems available and performant
- Manage infrastructure spend with discipline, identifying and resolving cost inefficiencies without compromising system resilience
- Mentor engineers across the team on infrastructure best practices, raising the technical bar in pull requests, runbooks, and production operations
Requirements:
- 8+ years of hands-on DevOps or platform engineering experience, with demonstrated ownership of production cloud infrastructure at scale
- Deep experience with AWS and Kubernetes, including designing, operating, and debugging production clusters under real load
- Proficiency with Terraform, Helm, and CI/CD pipelines using GitHub Actions or comparable tooling
- Strong command of observability tooling, including Datadog or equivalent platforms, with experience building alerting systems and leading incident response
- Experience in healthcare or another highly regulated industry, with working knowledge of relevant compliance and security requirements
- Track record of mentoring engineers and raising infrastructure standards across a team
- Fluency with LLM APIs, prompt engineering, and AI-assisted development tools; demonstrated experience building or evaluating AI-powered systems in production