Paramount is on a mission to unleash the power of content and is seeking a Senior Lead Data Engineer to build and scale data foundations for personalization systems. This role involves designing and operating data pipelines, ensuring data quality, and collaborating with various teams to deliver reliable, ML-ready data at scale.
Responsibilities:
- Build & Operate Large-Scale Feature Pipelines: Design and maintain batch/streaming pipelines (Spark, Flink, Databricks, Airflow) producing ML features for ranking models
- Ensure Point-in-Time Correctness: Develop feature sets that enable unbiased offline training and credible online inference
- Develop Embedding & Content Pipelines: Build scalable workflows for metadata, imagery, and multimodal representations; partner with Science teams to operationalize new models
- Architect Data Foundations: Design Delta/Parquet data models and medallion layers, optimizing storage layout and partitioning for latency and cost
- Real-Time Engineering: Build Kafka-based systems for real-time features and user-activity aggregations, ensuring robust handling of out-of-order events and exactly-once semantics
- Governance & Leadership: Define data quality rules and schema evolution processes while collaborating across ML pods to translate model needs into infrastructure
Requirements:
- 7+ years of experience in large-scale data or software engineering
- Deep experience with Spark (PySpark/Scala), Databricks, Airflow, and Kafka
- Proficiency in feature pipelines, temporal joins, and mitigating training-serving skew
- Experience with AWS/Azure/GCP and high-performance engines like Snowflake or Redshift
- Proficient programming skills in Python and SQL with a focus on performance optimization
- Experience in personalization domains (search, ranking, or recommender systems)
- Experience supporting petabyte-scale data lakehouses or feature stores
- Familiarity with GenAI/RAG systems, multimodal content, or Delta Live Tables
- Knowledge of Causal Inference, experimentation signals, or ML evaluation workflows
- Experience with Terraform for governed, repeatable deployments