Kohl's is seeking a Senior Data Engineer to lead the development and ownership of domain data products. The role involves designing and maintaining data pipelines, ensuring data reliability, and partnering with cross-functional teams to enable analytics and AI use cases.

Responsibilities:

Design, build and maintain batch, streaming and real-time Artificial Intelligence (AI) feature pipelines to extract data from diverse source systems and producers (Application Programming Interfaces (APIs), events, databases, files) ensuring efficient ingestion, transformation and publishing
Design, refine and implement scalable data models, semantic layers and data contracts to promote consistency, reuse and accessibility
Owns the end-to-end data product lifecycle for the domain. Define and maintain data contracts, including service level agreements (SLAs), schema expectations, quality metrics and consumer ownership, to ensure a reliable and trustworthy experience
Partner with cross functional teams to co-design scalable data solutions that meet business needs and clearly define the boundaries between data pipeline responsibilities and model-building activities
Develop automated workflows and Continuous Integration / Continuous Deployment (CI/CD) pipelines using tools such as Airflow, Apache Spark and Python to drive reliability and faster delivery
Implement validation, observability and evaluation frameworks that ensure accuracy, lineage and timeliness across data pipelines and large language model (LLM) outputs
Apply and enforce governance, privacy and compliance standards (GDPR, PCI DSS, CCPA), ensuring data security and traceability
Partner with cross functional teams to translate business needs into technical data solutions that scale across domains
Drive performance tuning, automation and adoption of AI-powered data tools to enhance data platform efficiency
Mentor data engineers and champion best practices for maintainable, governed and reusable data assets
Own cost and performance tradeoffs for domain data products and monitor compute usage, storage growth and unit cost to implement optimizations that reduce spend while meeting SLAs
Additional tasks may be assigned

Requirements:

4+ years designing, building and optimizing data pipelines and models in production, ideally within large-scale cloud environments
Proficiency in SQL and Python (or Scala) for data development, testing and automation
Bachelor's or Master's degree in Computer Science, Information Systems, Data Engineering or a related field
Experience with Apache Spark (or equivalent) for large-scale data processing and performance optimization
Experience using Airflow/Cloud Composer/Dagster for orchestration, transformation and CI/CD pipelines
Experience with cloud warehouses/lakes (BigQuery, Redshift, Snowflake) and object storage
Experience designing and optimizing streaming pipelines using Kafka, Pub/Sub, spark
Strong understanding of dimensional modeling, normalization and schema design for analytics and GenAI integration into data products
Experience with data testing, lineage, monitoring and observability frameworks to ensure data integrity and reliability

Senior Data Engineer

Key skills

About this role

Responsibilities:

Requirements: