Caris Life Sciences is dedicated to transforming cancer care and improving lives through precision medicine and innovation. They are seeking a Senior Staff Cloud DevOps Engineer to lead the design and implementation of scalable cloud-native infrastructure, focusing on AWS and Kubernetes while driving technical transformations and mentoring junior engineers.
Responsibilities:
- Architect and lead the implementation of large-scale, complex Kubernetes environments on AWS EKS, ensuring enterprise-grade availability, scalability, and security. Design and implement advanced features including multi-cluster management, service mesh architectures, and custom controllers
- Drive strategic proof-of-concept (PoC) initiatives for emerging technologies, evaluating their potential impact on the organization's technical landscape and business goals
- Lead the development of enterprise-wide standards for Kubernetes cluster lifecycle management, including upgrade strategies, security policies, and performance optimization frameworks
- Provide expert-level consultation to cross-functional teams on cloud-native application design, microservices architecture, and container orchestration strategies
- Architect and implement comprehensive, multi-layered observability solutions integrating metrics, logs, and traces. Develop predictive monitoring capabilities and establish organization-wide observability standards
- Lead the adoption of advanced Infrastructure as Code (IaC) practices and oversee the implementation of sophisticated CI/CD and GitOps-based continuous delivery (e.g., ArgoCD) pipelines, including modular design patterns, reusable components, and automated testing frameworks for efficient application deployment and image management across multiple environments
- Hands-on experience with GitOps-based continuous delivery using ArgoCD (app-of-apps, sync/health management, Helm/Kustomize deployments to Kubernetes)
- Develop and execute the organization's cloud security strategy, including the implementation of zero-trust architectures, automated compliance checks, and security-as-code practices. Translate complex regulatory and compliance requirements (e.g., HIPAA, SOC2, SOX) into actionable cloud implementation plans
- Spearhead large-scale cloud migration and application modernization initiatives, collaborating with business stakeholders to align technical solutions with organizational objectives
- Identify and implement strategic opportunities for process improvements and automation across the DevOps and cloud infrastructure landscape, developing and executing long-term roadmaps to drive continuous enhancement of cloud operations and DevOps practices
- Lead critical incident response efforts and participate in on-call rotations to support and improve the reliability of critical cloud infrastructure
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or related field
- 9+ years of experience in DevOps or Site Reliability Engineering roles
- 7+ years of hands-on experience with AWS services and cloud architecture
- 7+ years of hands-on experience with Kubernetes, including deep expertise in cluster management, troubleshooting, and optimization
- Expert-level proficiency in at least one programming language (e.g., Python, Go) with a track record of developing and maintaining production-grade software
- Extensive experience with Infrastructure as Code tools (e.g., Terraform, CloudFormation, AWS CDK) and ability to design scalable, modular IaC frameworks
- Deep understanding of containerization technologies and orchestration platforms, including security best practices and performance optimization techniques
- Proven experience in designing and implementing enterprise-grade CI/CD pipelines and DevOps workflows, preferred with GitLab CI, with hands-on GitLab CI/CD expertise (pipelines, runners, modular templates)
- Expert-level networking in complex AWS environments; hybrid/multi-cloud experience a plus
- Strong analytical and problem-solving skills, with the ability to troubleshoot complex distributed systems and provide innovative solutions
- Proven ability to lead PoC initiatives and evaluate new technologies
- Demonstrated experience in creating and maintaining technical documentation and knowledge bases
- Excellent communication skills, with the ability to effectively convey complex technical concepts to both technical and non-technical stakeholders, including executive leadership
- Proficient in Microsoft Office Suite, specifically Word, Excel, Outlook, and general working knowledge of Internet for business use
- Hands-on experience leveraging AI-assisted development tools (e.g., Claude, Cursor) to accelerate engineering work, including building automations, agents, and reusable skills/workflows that improve team productivity and reduce operational toil
- Hands-on experience with GitOps-based continuous delivery using ArgoCD (app-of-apps, sync/health management, Helm/Kustomize deployments to Kubernetes)
- AWS Professional level certifications (e.g., Solutions Architect Professional, DevOps Engineer Professional)
- Kubernetes certifications (e.g., CKA, CKAD, CKS)
- Experience with multiple cloud platforms (e.g., AWS, GCP) for multi-cloud architectures
- Deep expertise in database technologies including design and optimization of distributed database systems including MySQL, PostgreSQL, and Serverless platforms
- Proficiency with specific monitoring and observability tools such as DataDog and AWS CloudWatch
- Hands-on experience with configuration management tools (e.g., Ansible, Chef, Puppet)
- Experience in implementing knowledge management systems or tools in a DevOps environment
- Contributions to open-source projects or personal projects demonstrating cloud expertise