Zeely is an AI-driven marketing platform that helps small businesses grow through high-performing content and automation. The Senior DevOps Engineer will take ownership of evolving and scaling the infrastructure, focusing on reliability, developer experience, and optimizing cloud costs across a multi-cloud environment.
Responsibilities:
- Maintain and optimize AWS infrastructure (EKS, RDS, SQS, S3, OpenSearch and related services) to ensure reliability, scalability, and efficient resource usage
- Manage and scale GCP environments supporting AI services and data pipelines (e.g., BigQuery)
- Maintain supporting infrastructure across Cloudflare and Hetzner, including networking, edge services, and production traffic routing
- Drive Infrastructure as Code (Terraform) across all providers to ensure consistency and maintainability
- Maintain and improve Kubernetes-based production environments, ensuring high availability and autoscaling
- Refine monitoring, logging, and alerting to improve signal quality and incident response
- Ensure high availability and resilience of production infrastructure under growing workloads
- Design and lead the multi-cloud strategy, including future integrations (e.g., Azure)
- Collaborate with engineering teams to improve CI/CD pipelines and developer experience
- Implement and own APM and observability tooling
- Continuously optimize cloud infrastructure costs (FinOps) while maintaining performance and reliability
- Participate in on-call rotations, ensuring timely response to critical infrastructure incidents when they occur
Requirements:
- Deep expertise in AWS (EKS, EC2, RDS, SQS, S3, OpenSearch, etc)
- Advanced experience with Terraform (modules, state management, multi-cloud providers)
- Strong Kubernetes (EKS) administration and troubleshooting skills
- Hands-on experience with GCP (e.g., BigQuery, Compute Engine, CDC replication)
- Strong Linux, networking, and security fundamentals
- English — Intermediate+ and fluent Ukrainian for day-to-day internal communication with the team
- Experience with Cloudflare (R2, caching) and Hetzner infrastructure
- Previous experience setting up or managing Azure environments
- Experience optimizing OpenSearch / ELK stack performance
- Experience deploying and monitoring infrastructure for AI/ML workloads in AWS or GCP