Design, build, and run ADP and cloud services across AWS, GCP, and Azure, spanning both control plane and data plane components.
Translate ambiguous product requirements into robust technical designs and execution plans.
Lead large-scale projects and contribute to cross-functional architecture discussions.
Anticipate technical hurdles and proactively mitigate them to ensure high-quality, timely delivery.
Set and maintain rigorous standards for reliability, scalability, performance, security, and observability.
Build tools and services for automated infrastructure provisioning, deployment, and self-healing systems.
Participate in the on-call escalation chain, driving initiatives to reduce escalations and improve long-term system health.
Requirements
3+ years of experience designing and operating large-scale, highly reliable systems in public cloud environments.
A proven track record of leading complex technical projects end-to-end, from initial design through production operations.
Strong proficiency in Go (or a comparable systems language with a strong willingness to ramp up quickly).
Production-level experience with Kubernetes (running and operating workloads) and hands-on experience with at least one major cloud provider (AWS, GCP, Azure), including Infrastructure as Code (IaC).
Experience utilizing AI-assisted software development tools (e.g., Claude Code, GitHub Copilot) to optimize engineering velocity.
Familiarity with stream processing concepts.
Superior written and verbal communication skills, with comfort working in a globally distributed, asynchronous environment.
Nice to have :
Experience building Kubernetes operators for stateful or storage workloads.
Background building or operating enterprise SaaS platforms.
Experience with streaming platforms as either a user or a core provider.
Direct exposure to production-grade agentic AI systems and building agentic workflows.