Calix is a company focused on transforming Communication Service Providers through their cloud-first, AI-powered platform. They are seeking a skilled Staff Cloud Platform Engineer with expertise in Kafka to design, deploy, and optimize their event streaming infrastructure on Google Cloud Platform.
Responsibilities:
- Design, provision, and manage Apache Kafka clusters (self-managed on GCP/AWS or via Confluent Platform / MSK)
- Configure and tune brokers, ZooKeeper/KRaft, topics, partitions, replication factors, and retention policies for high throughput and low latency
- Perform cluster upgrades, rolling restarts, and broker replacements with zero downtime
- Implement and manage Kafka Connect pipelines for data ingestion and egress across heterogeneous systems
- Administer Kafka Streams and ksqlDB deployments for real-time stream processing workloads
- Maintain Schema Registry and enforce schema governance standards across teams
- Define and track SLIs/SLOs for consumer lag, throughput, end-to-end latency, and broker health
- Design and implement cloud infrastructure using IaC – Terraform
- Build automated deployment pipelines for Kafka configuration changes using GitOps workflows (ArgoCD, Flux)
- Create self-service tooling and runbooks to reduce toil for development teams
- Automate topic provisioning, ACL management, and schema registration via APIs and CLI tooling
- Integrate tools like GitLab CI/CD, or Cloud Build for automated testing and deployment
- Ensure seamless integration of data pipelines with other GCP services like Big Query, Cloud Storage
- Monitor and Optimize performance, reliability, and cost of Kafka and streaming pipelines
- Implement security best practices for GCP resources, including IAM policies, encryption, and network security
- Ensure Observability is an integral part of the infrastructure platforms and provides adequate visibility about their health, utilization, and cost
- Collaborate extensively with cross functional teams to understand their requirements; educate them through documentation/trainings and improve the adoption of the platforms/tools
Requirements:
- 10+ years of overall experience in DevOps cloud engineering, or data engineering
- 5+ years of experience in Kafka at production scale
- Deep expertise in Kafka internals: replication protocol, log compaction, consumer group coordination, partition leadership, and KRaft mode
- Proficiency with container orchestration (Kubernetes / Helm) and deploying Kafka via Strimzi, Confluent Operator, or equivalent
- Strong understanding of networking (VPC, peering, private endpoints, DNS, load balancing) in cloud environments
- Hands-on experience with Kafka Connect, Schema Registry, and at least one stream processing framework (Kafka Streams, Flink, Spark Structured Streaming)
- Proficiency in Google Cloud Platform (GCP) services, including Dataflow, Pub/Sub, Kafka, Dataproc, Big Query, and Cloud Storage
- Expertise in Infrastructure as Code (IaC) tools like Terraform or Cloud Deployment Manager
- Familiarity with data orchestration tools like Apache Airflow or Cloud Composer
- Experience with CI/CD tools like Jenkins, GitLab CI/CD, or Cloud Build
- Knowledge of containerization and orchestration tools like Docker and Kubernetes
- Strong scripting skills for automation (e.g., Bash, Python)
- Experience with monitoring tools like Cloud Monitoring, Prometheus, and Grafana
- Familiarity with logging tools like Cloud Logging or ELK Stack
- Strong problem-solving and analytical skills
- Excellent communication and collaboration abilities
- Ability to work in a fast-paced, agile environment