Vouched is building a powerful identity verification platform to provide worldwide access to critical services. They are seeking a Senior/Staff DevOps Engineer who will improve and develop infrastructure and reliability across the full product lifecycle, ensuring observability, performance, and security in their systems.
Responsibilities:
- Design, build, and operate cloud infrastructure (GCP preferred) using infrastructure-as-code, with an emphasis on repeatability, security, and cost efficiency
- Own and continuously improve CI/CD pipelines. Automated integration and unit testing, provisioning, deployments, and rollbacks to keep delivery fast and safe
- Build and maintain observability across the platform, including monitoring, logging, tracing, alerting, and meaningful dashboards that surface issues before customers do
- Improve and advance our security posture: secrets management, encryption in transit and at rest, IAM and least-privilege access, network segmentation, and vulnerability management across all infrastructure
- Drive compliance readiness by partnering with security and leadership to maintain, automate, and provide evidence for controls across frameworks such as SOC 2, ISO 27001, GDPR, (HIPAA a plus), including audit support and continuous control monitoring
- Lead incident response and the on-call rotation; drive blameless postmortems, reduce mean-time-to-recovery, and turn lessons learned into lasting fixes
- Define and uphold reliability targets (SLOs/SLIs), capacity planning, and performance tuning as we scale across countries and industries
- Leverage AI-powered tooling (Claude Code, Cursor, GitHub Copilot, and others) to accelerate infrastructure-as-code, automation, and internal tooling, and to improve incident triage and response
- Partner with engineering to improve developer experience and deployment velocity, removing friction and automating away toil
- Drive a culture of operational excellence, reliability, security, and continuous improvement across the engineering organization
- Set technical direction for platform and infrastructure, and mentor engineers on DevOps, reliability, and security best practices
- Continuously evaluate and adopt emerging AI-powered tools and workflows to improve infrastructure speed
Requirements:
- 6+ years of experience in DevOps, Site Reliability, Platform, or Infrastructure Engineering within a software engineering organization
- Deep expertise with a major cloud provider (GCP preferred) and strong understanding of networking, security, and distributed systems
- Strong hands-on experience with infrastructure-as-code (Terraform, Pulumi, and/or CloudFormation) and configuration management
- Production experience with containers and orchestration (Docker, Kubernetes, or ECS) and with building robust CI/CD pipelines (GitHub Actions, CircleCI, or similar)
- Proficiency with observability and monitoring stacks (Datadog, Prometheus/Grafana, CloudWatch, or equivalent)
- Solid scripting and programming skills (Python, Go, Bash, or TypeScript/Node) to build automation and tooling
- Strong grasp of cloud security best practices: IAM and least-privilege, secrets management, encryption, network security, and vulnerability management
- Hands-on experience supporting compliance frameworks such as SOC 2, ISO 27001, GDPR, HIPAA including control implementation, evidence and audit readiness, and compliance automation
- Proficiency with AI-powered development tools such as Claude Code, Cursor, GitHub Copilot, or equivalent; demonstrated ability to accelerate infrastructure and automation work with them
- Experience leading incident response and participating in an on-call rotation for production systems
- Excellent written and verbal communication skills; ability to document systems clearly and write actionable runbooks
- Experience working in a startup environment is required
- Experience managing or collaborating with distributed teams is essential
- Familiarity with identity verification products or AI/ML-based solutions is a plus