Home
Jobs
Saved
Resumes
Senior Site Reliability Engineer at Tango | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Senior Site Reliability Engineer
Tango
Remote
Website
LinkedIn
Senior Site Reliability Engineer
California, United States of America
Full Time
2 hours ago
$150,000 - $180,000 USD
Visa Sponsor
Apply Now
Key skills
Ansible
AWS
Azure
Cloud
DNS
Docker
Google Cloud Platform
Grafana
Java
Kubernetes
Linux
Prometheus
Python
Splunk
TCP/IP
Terraform
Go
GCP
Google Cloud
CloudFormation
Datadog
New Relic
Load Balancing
SaaS
CI/CD
Leadership
About this role
Role Overview
Own reliability outcomes for Tango’s cloud platform across production and non-production environments
Design, implement, and operate SLOs/SLIs, error budgets, and reliability reporting
Drive prioritization of reliability work with Engineering and Product
Build and maintain observability foundations: metrics, logging, tracing, dashboards, and alerting
Lead incident response and post-incident reviews
Engineer and evolve CI/CD and release safety practices
Improve infrastructure-as-code and environment consistency
Partner with Security and Compliance to support secure operations
Optimize cloud cost and capacity through right-sizing and performance tuning
Enable engineering teams with reliable internal tooling and automation
Mentor engineers on reliability best practices
Requirements
8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering supporting distributed SaaS applications
Strong background in Linux systems engineering
Networking fundamentals (TCP/IP, DNS, load balancing)
Proficiency with at least one programming language used for automation (e.g., Python, Go, or Java)
Strong scripting skills
Hands-on experience with cloud infrastructure (AWS, Azure, or GCP)
Deep experience with infrastructure-as-code and configuration management (e.g., Terraform, CloudFormation, Ansible)
Expertise in containerization and orchestration (Docker, Kubernetes)
Strong observability practices with tools such as Prometheus/Grafana, Datadog, New Relic, ELK/Splunk
Incident management leadership with a focus on root cause analysis
Experience designing and operating CI/CD pipelines and release management practices
Ability to work cross-functionally with Engineering, Product, Support, and Security
Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
Relevant certifications are a plus (e.g., AWS/Azure/GCP, Kubernetes CKA/CKAD, ITIL)
Tech Stack
Ansible
AWS
Azure
Cloud
DNS
Docker
Google Cloud Platform
Grafana
Java
Kubernetes
Linux
Prometheus
Python
Splunk
TCP/IP
Terraform
Go
Benefits
Competitive Compensation
Comprehensive Benefits Including health, dental, and vision insurance
401(k) plan with company match
Generous paid time off
Flexible Work Environment
Inclusive & Collaborative Culture
Apply Now
Home
Jobs
Saved
Resumes