DataSpring is a trusted data connector in the healthcare industry, with over 25 years of experience. They are seeking a Data Engineer to support the design, development, and maintenance of data pipelines and models, ensuring data quality and collaborating with various stakeholders.
Responsibilities:
- Build and maintain ETL/ELT pipelines across Databricks, Azure SQL, and downstream gold-layer models supporting priority projects
- Support development and enhancement of enriched data models, including field-level enrichment logic, recency rules, and provider-level enrichment flags
- Assist in maintaining data logic, including reconciliation between source and target data sources and resolution of duplication and data discrepancies
- Assist in implementing medallion architecture patterns (bronze → gold), ensuring data quality, traceability, and performance at scale
- Support identification and resolution of systemic data quality issues, including null handling, soft deletes, authorization flags, and incorrect organizational mappings
- Support implementation of rules for data in collaboration with product, governance, and engineering stakeholders
- Assist in documenting (Confluence, mapping workbooks) to serve as a single source of truth for enrichment logic and data behavior
- Support collaboration with vendors and partners for vendors providing detailed queries, validation logic, and corrective guidance on upstream data issues
- Collaborate with product owners and engineering teams to ensure data models align with product defined use cases
- Support UAT and release readiness by preparing data, validating counts, and resolving last‑mile data defects under tight timelines
Requirements:
- Build and maintain ETL/ELT pipelines across Databricks, Azure SQL, and downstream gold-layer models supporting priority projects
- Support development and enhancement of enriched data models, including field-level enrichment logic, recency rules, and provider-level enrichment flags
- Assist in maintaining data logic, including reconciliation between source and target data sources and resolution of duplication and data discrepancies
- Assist in implementing medallion architecture patterns (bronze → gold), ensuring data quality, traceability, and performance at scale
- Support identification and resolution of systemic data quality issues, including null handling, soft deletes, authorization flags, and incorrect organizational mappings
- Support implementation of rules for data in collaboration with product, governance, and engineering stakeholders
- Assist in documenting (Confluence, mapping workbooks) to serve as a single source of truth for enrichment logic and data behavior
- Support collaboration with vendors and partners for vendors providing detailed queries, validation logic, and corrective guidance on upstream data issues
- Collaborate with product owners and engineering teams to ensure data models align with product defined use cases
- Support UAT and release readiness by preparing data, validating counts, and resolving last‑mile data defects under tight timelines
- Strong foundational SQL skills (complex joins, reconciliation, performance tuning)
- Familiarity with Databricks, Delta Lake, and Azure SQL
- Basic understanding of data modeling for analytical, operational, and API‑driven use cases
- Ability to support troubleshooting of messy, evolving enterprise data domains
- Excellent written and verbal communication, especially for explaining complex data behavior to non‑technical stakeholders
- Experience using Git, DevOps tools, and CI/CD pipelines for data engineering workflows
- 1–3 years of experience in a data engineering or analytics engineering role, including internships or academic projects
- Demonstrated success contributing to data modernization or migration initiatives in cloud environments
- Prior experience working with healthcare or other regulated data environments is highly desirable
- Bachelor's degree in Computer Science, Information Systems, Data Engineering, or a related field
- Azure Data Engineer Associate or related certification
- Coursework or certification in AI/ML