Renaissance Learning is a global leader in pre-K–12 education technology, providing solutions that help educators enhance student learning experiences. They are seeking an experienced Sr Site Reliability Engineer to join their Engineering Enablement group, focusing on improving application and infrastructure reliability and security while supporting their SaaS platform used by millions of students.

Responsibilities:

Work with engineering, security & governance teams to improve observability, reliability, resiliency, auditability of our systems and minimize/prevent downtime
Contribute to infrastructure-as-code using Terraform & CloudFormation
Support CI/CD pipelines which ensures the prompt release of high-quality software
Collaborate with cross-functional teams to resolve infrastructure issues
Perform Disaster Recovery exercises on our products
Explore and integrate AI tooling into the SRE workflows
Be part of an on-call rotation & support off hour incidents & deployments
Demonstrates strong skills in giving constructive feedback through coaching even without direct reports

Requirements:

5+ years of experience focused on SRE
Experience in managing & monitoring containerized cloud environments in production, preferably AWS EKS
Experience with IaC, Configuration Management and Orchestration Tools like Terraform/Docker/Ansible
Hands-on experience in any of the programming or scripting languages like .NET/Java, Python, Javascript etc
On Call experience & willingness to be on call during non-work hours and weekends
Experience working in an agile environment
BS in Information Systems or Computer Science, related field experience, or both
Managing Kubernetes Clusters, EKS at Scale using Helm
Setting up Gitlab & Github pipelines & workflows
Experience setting up Monitoring, Logging, Alerting & Observability in tools such as NewRelic, Datadog, Grafana. CloudWatch, PagerDuty
Experience w/Teleport, Hashicorp Boundary etc
Experience w/RedShift, OpenSearch/ZeroETL
Experience running Disaster Recovery exercises
Implementing service level objectives (SLO/SLI/SLA's) & error budgets
Experience using ClaudeCode using agentic coding, agentic SDLC, enabling/rolling-out agentic DX

Sr Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: