Terawatt is a leader in delivering large scale, turnkey charging solutions for the transition to autonomous and electric vehicles. The Staff DevOps Engineer will help drive the evolution of Terawatt's platform, focusing on developing and maintaining the charging network management system while ensuring reliability and scalability of cloud infrastructure.
Responsibilities:
- Lead and architect the evolution of our cloud infrastructure using Terraform, building resilient and scalable systems to support business growth
- Maintain helm charts and deployment patterns that enable teams to manage the lifecycle of their services while adhering to established deployment standards
- Build tooling to enable engineering teams to own the application deployment process through CI/CD pipelines using GitHub Actions
- Promote security best practices across all layers of the stack, including software access, managed workloads, and services running in pre-production and production environments
- Strengthen cloud and network security using industry-standard tools to detect vulnerabilities and anomalies, and help prevent suspicious or malicious activity
- Advance observability practices using frameworks such as OpenTelemetry (OTel) and tools like Grafana Cloud for monitoring and alerting across services and infrastructure
- Develop tooling that supports both local and remote container-based cloud development workflows
- Create and automate simulated production scenarios used for testing during development and validating production releases
- Implement automation and alerting to maintain security and compliance standards, including SOC 2 controls
- Design and manage infrastructure that supports machine learning model training and deployment, ensuring scalable compute resources for ML workloads
- Partner with the Data team to manage core data infrastructure, including our Databricks data lake and Kafka event streams (Aiven/AWS), while advising on scalable data architecture and infrastructure improvements
- Contribute to building a highly available, web-based depot operations platform that supports the future of EV charging using NodeJS
- Participate in a 24/7 on-call rotation to support the reliability of production systems
Requirements:
- 8+ years of experience building and operating high availability production software systems, preferably in DevOps or platform engineering teams
- Experience building and maintaining scalable cloud-based infrastructure, including services running in managed Kubernetes (EKS)
- Experience building or maintaining CI/CD pipelines (e.g., GitHub Actions) to support reliable software delivery
- Experience leading or contributing to SRE or DevOps initiatives supporting production cloud platforms
- Experience with observability frameworks and tools (e.g., OpenTelemetry, Grafana, or similar platforms)
- Experience working with managed databases such as PostgreSQL, MongoDB, or similar systems
- Strong communication skills and the ability to collaborate effectively across engineering, product, and infrastructure teams
- Experience working with multi-region AWS infrastructure and Kubernetes (EKS) at scale
- Experience improving security and compliance practices through automation and internal tooling
- Experience implementing or scaling observability standards using OpenTelemetry and tools like Grafana Cloud
- Experience maintaining or scaling data infrastructure, such as Databricks, Kafka (MSK), or similar streaming/data platforms
- Proficiency in Python or NodeJS