1Password is a leading company in cybersecurity, dedicated to building a safe and productive digital future. As a Senior Cloud Platform Engineer, you will be responsible for designing and implementing infrastructure solutions that support the company's services while ensuring reliability and scalability.
Responsibilities:
- Build platform infrastructure – Design and implement self-service tools that let product teams deploy services without infrastructure tickets or manual provisioning
- Reduce operational toil – Identify repetitive manual work and build automation to eliminate it
- Visibility and observability – Implement monitoring, alerting, and dashboards that help teams sleep soundly knowing their services are healthy. Build systems that detect problems before users do, and make debugging production issues straightforward whether it's 3pm or 3am
- Participate in on-call rotation – Join the on-call rotation responding to infrastructure incidents. You'll work to reduce incident frequency through better automation and resilience patterns
- Scale infrastructure – Plan capacity, optimize performance, and ensure our platform handles growing traffic without degradation. You'll work on problems like reducing deployment times, improving resource utilization, and maintaining sub-100ms p99 latencies
- Collaborate across teams – Partner with security, product engineering, and SRE teams to understand needs and build solutions that work for everyone
Requirements:
- 5+ years working with distributed systems and microservices in production environments
- Strong AWS experience – You know EC2, ECS/EKS, VPC networking, IAM, and can architect multi-AZ resilient systems
- Infrastructure as Code fluency – Daily experience with Terraform or CloudFormation. You think in code, not clickops
- Programming skills for automation – Comfortable writing Go, Python, or similar languages to build tools and automation
- Kubernetes multi-tenancy production experience – You've deployed, scaled, and debugged containerized workloads in multi-tenanted production clusters
- Observability expertise – Hands-on experience with Prometheus, Grafana, Datadog, or equivalent. You know what to monitor and how to alert effectively
- Incident response experience – You've been on-call, resolved outages, and written postmortems that led to systemic improvements
- Security-minded approach – You default to least-privilege, encrypt at rest and in transit, and think about threat models
- GitOps experience with FluxCD and Kustomize
- Service mesh experience (Istio, Linkerd, Consul)
- Cost optimization experience in cloud environments
- Open source contributions to infrastructure tooling
- Experience with compliance frameworks (SOC 2, ISO 27001) and policy as code (Kyverno)