Work with engineering, security & governance teams to improve observability, reliability, resiliency, auditability of our systems and minimize/prevent downtime.
Contribute to infrastructure-as-code using Terraform & CloudFormation.
Support CI/CD pipelines which ensures the prompt release of high-quality software.
Collaborate with cross-functional teams to resolve infrastructure issues.
Perform Disaster Recovery exercises on our products.
Explore and integrate AI tooling into the SRE workflows.
Be part of an on-call rotation & support off hour incidents & deployments.
Demonstrates strong skills in giving constructive feedback through coaching even without direct reports.
Requirements
5+ years of experience focused on SRE
Experience in managing & monitoring containerized cloud environments in production, preferably AWS EKS
Experience with IaC, Configuration Management and Orchestration Tools like Terraform/Docker/Ansible
Hands-on experience in any of the programming or scripting languages like .NET/Java, Python, Javascript etc.
On Call experience & willingness to be on call during non-work hours and weekends
Experience working in an agile environment.
Tech Stack
Ansible
AWS
Cloud
Docker
Java
JavaScript
Python
Terraform
.NET
Benefits
World Class Health Benefits: Medical, Prescription, Dental, Vision, Telehealth
Health Savings and Flexible Spending Accounts
401(k) and Roth 401(k) with company match
Paid Vacation and Sick Time Off
12 Paid Holidays
Parental Leave (20 total weeks with 14 weeks paid) & Milk Stork program