Caris Life Sciences is dedicated to transforming cancer care and improving lives through precision medicine. They are seeking a Senior Staff Cloud DevOps Engineer to lead the architecture, implementation, and optimization of scalable cloud-native infrastructure, focusing on AWS and Kubernetes, while mentoring junior engineers and driving technical transformations.
Responsibilities:
- Architect and lead the implementation of large-scale, complex Kubernetes environments on AWS EKS, ensuring enterprise-grade availability, scalability, and security. Design and implement advanced features including multi-cluster management, service mesh architectures, and custom controllers
- Drive strategic proof-of-concept (PoC) initiatives for emerging technologies, evaluating their potential impact on the organization's technical landscape and business goals
- Lead the development of enterprise-wide standards for Kubernetes cluster lifecycle management, including upgrade strategies, security policies, and performance optimization frameworks
- Provide expert-level consultation to cross-functional teams on cloud-native application design, microservices architecture, and container orchestration strategies
- Architect and implement comprehensive, multi-layered observability solutions integrating metrics, logs, and traces. Develop predictive monitoring capabilities and establish organization-wide observability standards
- Lead the adoption of advanced Infrastructure as Code (IaC) practices and oversee the implementation of sophisticated CI/CD pipelines, including modular design patterns, reusable components, and automated testing frameworks for efficient application deployment and image management across multiple environments
- Develop and execute the organization's cloud security strategy, including the implementation of zero-trust architectures, automated compliance checks, and security-as-code practices. Translate complex regulatory requirements (e.g., SOX) into actionable cloud implementation plans
- Spearhead large-scale cloud migration and application modernization initiatives, collaborating with business stakeholders to align technical solutions with organizational objectives
- Identify and implement strategic opportunities for process improvements and automation across the DevOps and cloud infrastructure landscape, developing and executing long-term roadmaps to drive continuous enhancement of cloud operations and DevOps practices
- Lead critical incident response efforts and participate in on-call rotations to support and improve the reliability of critical cloud infrastructure
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or related field
- 9+ years of experience in DevOps or Site Reliability Engineering roles
- 7+ years of hands-on experience with AWS services and cloud architecture
- 7+ years of hands-on experience with Kubernetes, including deep expertise in cluster management, troubleshooting, and optimization
- Expert-level proficiency in at least one programming language (e.g., Python, Go) with a track record of developing and maintaining production-grade software
- Extensive experience with Infrastructure as Code tools (e.g., Terraform, CloudFormation, AWS CDK) and ability to design scalable, modular IaC frameworks
- Deep understanding of containerization technologies and orchestration platforms, including security best practices and performance optimization techniques
- Proven experience in designing and implementing enterprise-grade CI/CD pipelines and DevOps workflows, preferred with GitLab CI
- Expert-level knowledge of networking concepts and implementation in complex cloud environments, including hybrid and multi-cloud architectures
- Strong analytical and problem-solving skills, with the ability to troubleshoot complex distributed systems and provide innovative solutions
- Proven ability to lead PoC initiatives and evaluate new technologies
- Demonstrated experience in creating and maintaining technical documentation and knowledge bases
- Excellent communication skills, with the ability to effectively convey complex technical concepts to both technical and non-technical stakeholders, including executive leadership
- Proficient in Microsoft Office Suite, specifically Word, Excel, Outlook, and general working knowledge of Internet for business use
- AWS Professional level certifications (e.g., Solutions Architect Professional, DevOps Engineer Professional)
- Kubernetes certifications (e.g., CKA, CKAD, CKS)
- Experience with multiple cloud platforms (e.g., AWS, GCP) for multi-cloud architectures
- Deep expertise in database technologies, including design and optimization of distributed database systems, including MySQL, PostgreSQL, and Serverless platforms
- Proficiency with specific monitoring and observability tools such as Prometheus, Grafana, and ELK stack
- Hands-on experience with configuration management tools (e.g., Ansible, Chef, Puppet)
- Experience in implementing knowledge management systems or tools in a DevOps environment
- Contributions to open-source projects or personal projects demonstrating cloud expertise