Design and operate event-driven data pipelines using Kafka consumers and Flink jobs to process high-volume customer events (clicks, purchases, returns) in near-real time.
Build and optimize large-scale data transformations on Google Cloud Platform — BigQuery SQL, query performance tuning, and partitioning strategy at scale.
Develop Python data engineering workloads using Polars or Pandas at scale, with rigorous attention to Parquet partitioning, join performance on large datasets, and memory efficiency.
Build, deploy, and maintain ML pipeline components on Kubeflow Pipelines (KFP) and Vertex AI; package and deploy services with Docker.
Design event store architecture: partitioning by customer, time-ordered event assembly across heterogeneous sources, and schema management for mixed event types.
Partner with ML engineers, platform engineers, and data scientists to deliver clean, performant, model-ready data products.
Document architecture decisions and contribute to engineering standards across the platform team.
Requirements
6–12 years of experience in data engineering, platform engineering, or a closely related discipline.
Streaming: Production experience with Kafka consumers and Flink stream processing — building, deploying, and operating streaming jobs at meaningful scale.
GCP Data Stack: Strong SQL on BigQuery (or an equivalent cloud warehouse), with demonstrated query optimization, cost management, and partitioning chops.
Python Data Engineering: Hands-on with Polars or Pandas at scale; deep working knowledge of Parquet partitioning and performance on large joins.
ML Pipelines: Hands-on experience building and deploying components on Kubeflow Pipelines (KFP) and/or Vertex AI Pipelines; working proficiency with Docker.
Event Store Design: Demonstrated experience designing event stores — partitioning by customer, time-ordered event assembly across sources, schema strategy for mixed event types (clicks, purchases, returns).
Communication: Strong written and verbal communication; comfortable being the senior IC voice in design conversations with client stakeholders.
Nice to Have: Domain experience in Retail or E-commerce — customer journey data, transaction analytics, returns and exchanges modeling.
Nice to Have: Exposure to schema registry tooling (e.g., Confluent), Iceberg, or Delta Lake.
Nice to Have: Experience working in client-facing or consulting engagements.
Nice to Have: Google Cloud certifications (Professional Data Engineer or equivalent).
Tech Stack
Assembly
BigQuery
Cloud
Docker
Google Cloud Platform
Kafka
Pandas
Python
SQL
Benefits
EXL is open to sponsoring H1B transfers for qualified candidates.