Lead, coach, and develop a high-performing team of SRE and Platform Specialists responsible for the reliability, scalability, security, and operational excellence of CIRA's registry platforms and supporting technology services.
Define and execute the platform and site reliability strategy, aligning priorities and investments with organizational objectives and customer needs.
Drive the design, operation, and continuous improvement of scalable, resilient, cloud-native platforms using public cloud technologies such as AWS.
Champion automation, infrastructure as code, GitOps, CI/CD, and self-service platform capabilities to reduce manual effort, operational toil, and engineering bottlenecks.
Establish and continuously improve observability, monitoring, alerting, and dashboarding practices to provide clear visibility into platform health, service reliability, and customer-impacting issues.
Lead incident management for high-severity events, providing incident command, stakeholder communication, root cause analysis, and driving follow-up actions that strengthen long-term platform resilience.
Collaborate with engineering, security, support, compliance, and business stakeholders to establish priorities, balance risk, and deliver platform improvements that support registry operations and organizational goals.

7+ years of progressive experience in Site Reliability Engineering (SRE), platform engineering, DevOps, infrastructure, or cloud operations, including hands-on experience with public cloud platforms such as AWS.
3+ years of experience leading, coaching, and developing technical teams in SRE, platform engineering, DevOps, infrastructure, or cloud operations.
Strong hands-on background with public cloud platforms, preferably AWS, including cloud-native architecture, networking, security, resilience, scalability, and cost-aware operations.
Experience leading teams that implement and operate infrastructure as code (IaC), GitOps, and automation practices to manage cloud infrastructure, platform services, and deployment workflows.
Strong understanding of CI/CD principles, release automation, and modern software delivery practices.
Experience with containerization and orchestration technologies such as Docker and Kubernetes.
Exceptional communication, collaboration, and stakeholder management skills, with the ability to translate complex technical concepts into clear business outcomes for both technical and non-technical audiences.

Manager, Platform & Site Reliability

Key skills