Design, build, and maintain scalable end-to-end data pipelines using Databricks, Spark, and related technologies.
Develop efficient data processing and transformation workflows to support analytics and reporting needs.
Integrate diverse data sources including APIs, databases, and cloud storage into unified datasets.
Work closely with cross-functional teams (data science, analytics, business units) to design and implement data solutions that align with business goals.
Implement robust validation, monitoring, and observability processes to ensure data accuracy, completeness, and reliability.
Contribute to data governance, security, and automation initiatives within the data ecosystem.
Leverage AWS services (e.g., S3, Glue, Lambda, Redshift) to build and deploy data solutions in a cloud-native environment.
Requirements
Experience with cloud-based ETL services (e.g. AWS Glue, Google Cloud Dataflow, Azure Data Factory)
Experience with Cloud data warehousing technologies (e.g. Amazon Redshift, Google BigQuery, Snowflake)
Experience with Python, SQL, Spark, and PySpark
Experience with data platforms like Databricks, Palantir, and Snowflake
Familiarity with data orchestration and data quality processes