Design, develop, automate and maintain scalable, robust and reliable ELT/ETL data pipelines that collect, process and transform large volumes of structure and unstructured data from various sources
Maintain and enhance our existing data architecture to ensure smooth and efficient data flow across platforms
Interface with data peers, product managers and cross-functional stakeholders to gather requirements, sequence work, and document technical solutions
Implement best practices for data quality, integrity and governance, including monitoring, validation and auditing processes to ensure reliable and consistent data availability
Contribute to a team culture that values quality, robustness, and scalability while fostering initiatives and innovation by staying up to date with industry trends and new technologies
Apply an AI-native data engineering mindset by using ML-driven validation, anomaly detection, automated schema evolution, automated data lineage and optimizations for pipeline performance and resource efficiency
Requirements
5+ years of data processing and data engineering experience in a fast-paced, large cloud-based infrastructure (AWS experience required)
Hands-on software development experience in Python
Expert understanding of SQL, dimensional modeling, and analytical data warehouses, such as Snowflake, Presto/Hive
Understanding of Data Engineering best practices for medium to large scale production workloads
Knowledge of big data processing frameworks (e.g. Spark, Hadoop)
Expertise with data pipeline orchestration tools, such as Airflow
Familiar with processing semi-structured file formats such as Json or parquet
Bachelor’s degree in computer science, data science, or related fields