Kohl's is seeking a Senior Data Engineer to lead the development and ownership of domain data products. The role involves designing and maintaining data pipelines, ensuring data reliability, and partnering with cross-functional teams to enable analytics and AI use cases.
Responsibilities:
- Design, build and maintain batch, streaming and real-time Artificial Intelligence (AI) feature pipelines to extract data from diverse source systems and producers (Application Programming Interfaces (APIs), events, databases, files) ensuring efficient ingestion, transformation and publishing
- Design, refine and implement scalable data models, semantic layers and data contracts to promote consistency, reuse and accessibility
- Owns the end-to-end data product lifecycle for the domain. Define and maintain data contracts, including service level agreements (SLAs), schema expectations, quality metrics and consumer ownership, to ensure a reliable and trustworthy experience
- Partner with cross functional teams to co-design scalable data solutions that meet business needs and clearly define the boundaries between data pipeline responsibilities and model-building activities
- Develop automated workflows and Continuous Integration / Continuous Deployment (CI/CD) pipelines using tools such as Airflow, Apache Spark and Python to drive reliability and faster delivery
- Implement validation, observability and evaluation frameworks that ensure accuracy, lineage and timeliness across data pipelines and large language model (LLM) outputs
- Apply and enforce governance, privacy and compliance standards (GDPR, PCI DSS, CCPA), ensuring data security and traceability
- Partner with cross functional teams to translate business needs into technical data solutions that scale across domains
- Drive performance tuning, automation and adoption of AI-powered data tools to enhance data platform efficiency
- Mentor data engineers and champion best practices for maintainable, governed and reusable data assets
- Own cost and performance tradeoffs for domain data products and monitor compute usage, storage growth and unit cost to implement optimizations that reduce spend while meeting SLAs
- Additional tasks may be assigned
Requirements:
- 4+ years designing, building and optimizing data pipelines and models in production, ideally within large-scale cloud environments
- Proficiency in SQL and Python (or Scala) for data development, testing and automation
- Bachelor's or Master's degree in Computer Science, Information Systems, Data Engineering or a related field
- Experience with Apache Spark (or equivalent) for large-scale data processing and performance optimization
- Experience using Airflow/Cloud Composer/Dagster for orchestration, transformation and CI/CD pipelines
- Experience with cloud warehouses/lakes (BigQuery, Redshift, Snowflake) and object storage
- Experience designing and optimizing streaming pipelines using Kafka, Pub/Sub, spark
- Strong understanding of dimensional modeling, normalization and schema design for analytics and GenAI integration into data products
- Experience with data testing, lineage, monitoring and observability frameworks to ensure data integrity and reliability