Design and lead the build-out of an agentic platform operating model — where AI agents (Claude, GitHub Copilot, and custom agents) are the primary interface between product teams and cloud infrastructure
Replace manual ticketing workflows with agent-driven request handling: developers describe what they need in natural language or via CLI, and agents generate, validate, and apply the required Terraform or configuration changes
Build agent workflows that guide product teams through infrastructure onboarding, access requests, environment bootstrapping, and compliance checks — without requiring Cloud Platform team intervention
Establish GitHub as the operational backbone: issues, PRs, documentation, and agent interactions all flow through a GitHub-native model
Instrument agents with awareness of platform standards, security guardrails, and organizational context — so they enforce policy automatically rather than escalating to humans
Define and communicate the agentic roadmap to senior leadership, engineering teams, and product stakeholders
Own the security posture of the cloud platform layer — ensuring identity, access, and network controls are implemented consistently and enforced through automation across GCP, AWS, and Azure
Implement and maintain security guardrails at the organization and pipeline levels, ensuring all infrastructure provisioned through the platform meets baseline security and compliance requirements
Lead IAM governance: role binding, access provisioning, key rotation, service account hygiene, and Workload Identity Federation — with a goal of automating these controls through agents and policy-as-code
Partner with the Security team to ensure platform capabilities align with organizational security standards and support audit requirements (SOC 2, PIPEDA, HIPAA-aligned practices)
Build security into the self-service golden paths — so that teams provisioning infrastructure through approved patterns inherit secure defaults automatically
Treat security findings as engineering problems: prioritize remediation through code, automation, and agent enforcement rather than manual review cycles
Design opinionated “golden path” frameworks using Terraform, Terragrunt, and GitHub Actions that standardize and secure infrastructure patterns across GCP, AWS, and Azure
Build and maintain a centralized module marketplace and IaC library that teams and agents can consume confidently
Ensure all self-service capabilities are agent-accessible — designed for both human and programmatic consumption from day one
Establish clear support boundaries: teams using the golden path get full support; non-standard configurations are self-supported
Ensure operational coverage across the multi-cloud estate: GCP, AWS, and Azure
Lead incident management with a focus on durable remediation — every significant incident produces agent runbooks, automation, or documentation that prevents recurrence
Drive down request volume through agentic self-service, not headcount scaling — treating high ticket volume as an engineering problem to be automated away
Coordinate with the SRE and observability teams to ensure platform services meet reliability expectations and incidents are routed and resolved efficiently
Build and maintain CI/CD pipelines and Infrastructure-as-Code to automate provisioning, configuration management, patching, and compliance enforcement
Contribute to the golden image factory initiative — ensuring CIS-hardened, patched base images are available on-demand across all cloud platforms
Champion a “security as code” mindset across the team — policy enforcement, compliance checks, and access controls are implemented in pipelines and agents, not spreadsheets
Manage a blended team of platform engineers and cloud operations engineers, with a deliberate focus on growing agent-building, automation, and security engineering skills
Hire for engineers who are energized by building AI-driven, security-first systems — not just operating existing ones
Foster a learning culture — create space for the team to grow in agentic development, cloud security, certifications, and IaC alongside day-to-day responsibilities
Help shape and evolve team ceremonies and ways of working and contributing to how the team structures its delivery cadence, retrospectives, and planning without being the sole driver of execution
Partner with Product, Engineering, Security, and Architecture teams to align platform and agentic capabilities with organizational priorities
Serve as the internal champion for agentic workflows — helping product and engineering teams understand how to interact with the platform through agents rather than manual processes
Report on platform adoption, agent utilization, security posture, and toil-reduction progress to senior leadership
Requirements
5+ years of progressive experience in cloud platform engineering or cloud operations — with at least 2 years in a people management or technical leadership role
A genuine belief in agentic-first, security-first workflows and a track record of building automation that replaces manual processes — not just augments them
Experience leading teams through transformation: from reactive, ticket-driven operations toward proactive, agent-driven platform delivery
Strong communication skills — able to translate platform complexity into clear narratives for executive leadership and business stakeholders
Hands-on experience across at least two of GCP, AWS, and Azure — with a solid grasp of identity, networking, compute, and security controls at scale
Deep expertise in Infrastructure-as-Code (Terraform, Terragrunt) and the ability to design secure, reusable, opinionated module libraries
Experience building or working with AI agents and agentic workflows — including prompt engineering, tool use, and integrating agents with CI/CD systems and infrastructure APIs
Strong understanding of cloud security fundamentals: IAM, RBAC, service accounts, Workload Identity Federation, network security, and secrets management
Experience implementing policy-as-code and automated compliance enforcement in multi-cloud environments
Proficiency in at least one scripting/programming language (Python, Go, Bash) — you write code, not just YAML
Experience building developer-facing self-service platforms, including CLI tools, GitHub Actions workflows, and chat-based interfaces.
Familiarity with request and workflow management practices — and an instinct for treating high request volume as an engineering problem to be automated away
Understanding of security and compliance requirements in regulated healthcare environments (SOC 2, HIPAA-aligned practices, PIPEDA)
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Python
Terraform
Go
Benefits
Comprehensive total rewards package highlighting competitive salary and bonus structures
Minimum 3 weeks of vacation
Flexible benefits plan to meet the needs of you and your family
Flexibility to work in-office, virtually or a combination of both, based on the role's requirements
Generous company matched pension and share purchase programs
Opportunity to give back to communities in which we work, live and serve
Career growth and learning & development opportunities to develop your skills