Amwell is transforming healthcare through technology and innovation. They are seeking a Senior Site Reliability Engineer to support key infrastructure within their data center and cloud environments, focusing on automation, operational tooling, and scalable systems management.
Responsibilities:
- Support production systems on platforms such as ESXi, Azure, AWS, and GCP
- Utilize configuration management tools for scalable and repeatable systems management including Ansible and Puppet
- Design, develop, and maintain automation frameworks, scripts, and operational tooling to improve scalability, reliability, and operational efficiency across infrastructure and platform services
- Configure, maintain, patch, and troubleshoot Linux operating systems with basic knowledge of Windows operating systems
- Ensure compliance with security and data handling policies to meet PCI, HIPAA, and other standards
- Develop and maintain Infrastructure-as-Code (IaC) solutions using tools such as Terraform, Ansible, and Puppet to support repeatable and standardized deployments
- Collaborate with peers as an accountable and supportive member of Amwell technology teams
- Participate in 24/7 call rotation and scheduled maintenance tasks
Requirements:
- 5 or more years of experience managing Linux based systems, certifications a plus
- Strong experience with Infrastructure-as-Code and configuration management technologies including Terraform, Ansible, Puppet, or similar automation frameworks
- Build automation workflows for system provisioning, patch management, monitoring, configuration management, incident response, and operational remediation
- Experience with on-prem and cloud based virtualization platform compute and storage, such as ESXi, Azure, AWS and GCP
- Experience with Elasticsearch/Logstash/Kibana analytics engine (ELK Stack)
- Experience managing Identity and Authentication solutions including LDAP, Active Directory, and Multi-Factor Authentication
- Strong scripting and software development skills using languages such as Python, Bash, or PowerShell, with experience building reusable automation tooling and operational integrations in hybrid cloud and on-premise environments
- Experience developing monitoring, alerting, and self-healing automation solutions
- Solid foundation of TCP/IP networking concepts
- Experience supporting large-scale production environments with an automation-first operational mindset