Sprinter Health is reimagining how people access care by bringing it directly to their homes. They are seeking a Staff Site Reliability Engineer to build the reliability, infrastructure, and security foundations that power last-mile healthcare delivery at scale, focusing on operational efficiency and system resilience.

Responsibilities:

Design, build, and improve the infrastructure that powers Sprinter’s patient care, clinician operations, internal tooling, and partner-facing systems
Improve reliability across distributed systems, cloud infrastructure, CI/CD, observability, and incident response
Raise the security baseline across cloud infrastructure, access controls, secrets management, identity, and operational workflows
Build and maintain infrastructure as code using Terraform and related tooling
Automate manual infrastructure and operational processes through scripting, tooling, and platform improvements
Partner with engineering teams to improve system architecture, deployment practices, monitoring, logging, and alerting
Troubleshoot complex issues across infrastructure, application, data, and operational boundaries
Help define reliability, security, and infrastructure standards that allow Sprinter to scale without creating brittle systems
Support incident response practices, postmortems, operational readiness, and continuous improvement across engineering
Make pragmatic tradeoffs between reliability, security, speed, and simplicity in a fast-moving startup environment

Requirements:

Spent 8+ years in site reliability engineering, platform engineering, infrastructure engineering, security engineering, or related technical roles
Led high-impact infrastructure, reliability, platform, or security projects end to end with minimal oversight
Built and operated production systems in cloud environments, ideally AWS and/or GCP
Worked deeply with infrastructure as code, ideally Terraform
Improved observability, monitoring, logging, alerting, and incident response practices across engineering teams
Automated infrastructure, deployment, or operational workflows using scripting languages such as Python, Bash, or TypeScript
Improved cloud security, access management, secrets management, networking, or operational controls
Troubleshot production issues across application, infrastructure, networking, and deployment layers
Worked in environments where reliability, security, ambiguity, and speed all matter
Made technical decisions that balanced immediate business needs with long-term scalability, reliability, and maintainability
You've built or scaled infrastructure in health tech, logistics, marketplace, fintech, or other operationally complex environments
You've worked in mid- or growth-stage startups where speed, ambiguity, and pragmatic decision-making were required
You have experience improving security posture in a practical, engineering-friendly way
You've helped establish reliability standards, incident response practices, or platform patterns across an engineering org
You're comfortable working directly with product engineers, data teams, operations, security stakeholders, and technical leadership
You have experience mentoring engineers and raising the operational bar across a broader engineering team
You've worked in regulated environments and understand the importance of privacy, security, and compliance best practices
You have people management experience or interest in growing into broader technical leadership over time

Staff, Site Reliability Engineer (SRE)

Key skills

About this role

Responsibilities:

Requirements: