Responsible for delivering, managing, and operating scalable trusted data products and platforms that enable trusted analytics, AI/ML, and Generative AI use cases
Leading the task of curating datasets and data pipelines created by various business departments, data scientists, and other technology teams
Using innovative and modern tools, techniques and architectures to automate the most common, repeatable and tedious data preparation and integration tasks
Developing and improving standards and procedures to support quality development, testing, and production support
Act as an innovation catalyst—rapidly prototyping new approaches and turning the best ideas into production-grade capabilities
Designs and develops durable, flexible, and scalable data pipelines, data load processes and frameworks to automate the ingestion, processing and delivery of both structured and unstructured batch and real-time streaming data
Requirements
Bachelor or Master of Science in Engineering, Computer Science, Information Technology or equivalent
6+ years of experience in Data Warehouse design and data modeling patterns (relational and dimensional)
6+ years of experience with ETL tool development such as Talend or ADF
Must have strong analytical skills for effective problem solving
Ability to work independently, handle multiple tasks simultaneously and adapt quickly to change with a variety of people and work styles
Must be capable of fully articulating concisely technical concepts to non-technical audiences
Hands-on experience with at least one major cloud (AWS/Azure/GCP) and one warehouse/lakehouse technology (e.g., Snowflake, BigQuery, Redshift, Databricks/Lakehouse)
Strong proficiency in Python and/or Java/Scala; ability to build maintainable services and libraries
Experience with GitHub Copilot and Databricks Assistant a plus
Experience building or operating streaming pipelines using Kafka/Kinesis/Pub/Sub
Experience with Spark (or equivalent) and a workflow orchestrator (e.g., Airflow) plus familiarity with CI/CD and automated testing
Experience partnering with data science/ML teams, supplying training-ready datasets/features, and designing data products that support ML in production