Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
Lead technical initiatives around observability, incident response, and operational excellence — building systems that enable rapid detection and resolution of issues
Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint
Drive infrastructure-as-code practices using tools like Terraform and Pulumi to enable reproducible, auditable deployments
Mentor engineers and raise the technical bar across the organization through code reviews, design reviews, and technical leadership.
Requirements
6+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
Long track record building and scaling complex, large-scale distributed systems
Deep proficiency with cloud infrastructure platforms (Azure preferred; GCP or AWS experience transfers well)
Strong fluency in Infrastructure as Code (IaC) tools — Terraform, Pulumi, or CloudFormation
Solid understanding of Kubernetes, container orchestration, networking, and cloud security at scale
Experience with observability tools (Datadog, Sentry) and incident response practices (PagerDuty, Incident.io)
Strong programming skills in Python, Go, or similar languages
Excellent problem-solving skills, a "spidey sense" of where things could go wrong, and a commitment to operational excellence.
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Google Cloud Platform
Kubernetes
Python
Terraform
Go
Benefits
Work eligibility: Must be authorized to work in India. Visa sponsorship is not available for this role.