VARITE INC is a global staffing and IT consulting company providing technical consulting and team augmentation services. They are seeking a Senior DevOps & Site Reliability Engineer to manage and optimize their hybrid cloud infrastructure, focusing on automation and observability across AWS and Azure.
Responsibilities:
- Multi-Cloud Infrastructure Management: Design, deploy, and manage scalable, fault-tolerant infrastructure and services across both Client Web Services (AWS) and Microsoft Azure
- Infrastructure as Code (IaC): Develop and maintain infrastructure automation using tools like Terraform, AWS CloudFormation, or Azure ARM Templates/Bicep, ensuring consistency and repeatability across cloud platforms
- Well-Architected Implementation: Ensure all infrastructure designs and implementations adhere to the principles of the AWS Well-Architected Framework and the Azure Well-Architected Framework, covering operational excellence, security, reliability, performance efficiency, and cost optimization
- CI/CD Pipeline Management: Build, maintain, and optimize robust CI/CD pipelines (using Jenkins, GitLab CI, AWS CodePipeline, Azure DevOps) to facilitate rapid and reliable application deployment
- Monitoring & Alerting: Implement comprehensive monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Datadog, AWS CloudWatch, Azure Monitor, Splunk) to proactively identify and resolve production issues
- Incident Response: Lead incident response efforts, conduct root cause analyses (RCA), and implement preventative measures to improve system reliability and reduce Mean Time To Recovery (MTTR)
- Performance Tuning: Monitor application and infrastructure performance, identify bottlenecks, and implement optimizations to ensure high availability and responsiveness
- SLOs/SLAs: Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to meet agreed-upon Service Level Agreements (SLAs)
- Security Automation: Implement security best practices into the DevOps workflow (DevSecOps), automating security checks and managing access controls within both AWS (IAM) and Azure (Azure AD, RBAC)
- Collaboration: Work closely with software development, QA, and product teams to ensure smooth releases, troubleshoot production issues, and foster a culture of shared ownership
Requirements:
- Exp: 9 to 12 years
- Extensive Experience: 5+ years of experience in DevOps, SRE, or a similar cloud infrastructure role, with substantial experience managing both AWS and Azure environments
- Multi-Cloud Expertise: Deep working knowledge of core services in both AWS (EC2, S3, RDS, VPC, Lambda, etc.) and Azure (VMs, Blob Storage, Azure SQL DB, VNet, App Services, AKS, etc.)
- Automation & Scripting: Strong scripting skills in Python, Bash, or Go, and hands-on experience with IaC tools (Terraform strongly preferred)
- CI/CD Tools: Proficiency with multi-cloud CI/CD platforms and version control systems (Git, Jenkins, GitLab CI, Azure DevOps, AWS CodePipeline)
- Observability Stack: Experience setting up and managing modern logging, monitoring, and alerting tools across cloud platforms
- Architecture Knowledge: Practical experience implementing and auditing systems against the AWS and Azure Well-Architected Frameworks
- Required AI Skills: All contractor resources are expected to demonstrate baseline proficiency in enterprise-approved AI tools as part of their day-to-day responsibilities
- Consistent Use: Maintain a minimum of 90% weekly usage of AI tools such as GitHub Copilot, Microsoft 365 Copilot, and other GenAI platforms approved by the enterprise
- Applied Productivity: Leverage AI tools to enhance coding, documentation, data analysis, and decision-making workflows
- Continuous Learning: Stay current with evolving AI capabilities and features, and apply them to improve delivery quality and velocity
- Cloud Certifications (e.g., AWS Certified DevOps Engineer – Professional, Microsoft Certified: Azure DevOps Engineer Expert)
- Experience with containerization and orchestration (Docker, Kubernetes/EKS/AKS)
- Experience with configuration management tools (Ansible, Chef, Puppet)