The START Center for Cancer Research is the world’s largest early phase site network dedicated to oncology clinical research. They are hiring a Sr. Data Engineer to build the data infrastructure for their Enterprise Data Platform, focusing on developing data pipelines and ensuring data quality and lineage for integrated systems.

Responsibilities:

Build and maintain ingestion pipelines from source systems (OnCore, NetSuite, HubSpot, Microsoft Lists, Snowflake, FileMaker) into ADLS/Databricks
Implement incremental load patterns, change data capture, and idempotent pipeline design to ensure reliability
Design/Implement Metadata, Data Quality and Data Lineage capabilities
Design/Implement AccessControl/RBAC capabilities
Design/Implement DataRights/Licensing capabilities
Develop the ETL/ELT processes that feed Lakehouse/Relational/Warehouse modeling requirements (matching, deduplication, golden record assembly) as designed by the MDM Lead
Build publication pipelines that push canonical data models from the Data Platform back to spoke systems, coordinating with Integration Platform (Boomi based) and Messaging Platform (Azure Service Bus based) Teams
Implement Lakehouse/Relational/Warehouse tables for golden records across priority entities: Study/Protocol, Customer/Sponsor, Item/Charge Code, and Contract
Build the matching and survivorship logic based on rules defined by data model requirements and validated by business stakeholders
Implement versioning, lineage and audit trails using Delta Lake time travel capabilities for full traceability of master data changes
Configure Unity Catalog for data governance, access controls, and lineage tracking
Build automated data quality checks at ingestion, transformation, and publication stages
Develop data quality dashboards and alerting (integrate with existing monitoring tools or build in Databricks SQL)
Implement reconciliation count checks between source systems and the hub to detect drift or sync failures
Create exception handling pipelines that surface records requiring manual review
Build the data models supporting Revenue Cycle (e.g. OnCore-to-NetSuite) reconciliation (matching clinical events to financial transactions)
Develop the unbilled-vs-billed tracking datasets that compare recognized sales order lines against invoiced amounts
Create revenue accrual support datasets that feed the finance team’s automated journal entry processes in NetSuite
Support pass-through item mapping and amendment pricing reconciliation data needs as the finance team defines requirements
Establish CI/CD patterns for Databricks notebooks and jobs (Repos integration, testing frameworks)
Configure and manage job scheduling, cluster policies, and cost optimization
Maintain dev/staging/production environment separation
Document all pipelines, data models, and operational procedures

Requirements:

4+ years of experience as a data engineer, with at least 2 years on Azure Databricks or equivalent Spark-based platforms
Strong/Current proficiency in Azure Data Factory, ADLS Gen2, Databricks, Delta Lake, Databricks SQL, Azure SQL
Strong/Current proficiency in Python, SQL, PySpark, and Spark SQL
Experience with Data Lakes, Lake Houses and Warehouses
Experience building production data pipelines with proper error handling, retry logic, idempotency, and monitoring
Familiarity Azure and Azure ecosystem services
Experience with Unity Catalog, Purview or equivalent data governance/catalog tooling
Experience with Data Governance guidelines such as Data Classification, Retention, De-Identification, Tenancy, Sovereignty and Data Standards
Experience with CI/CD for data engineering workloads (Databricks Repos, Azure DevOps, or similar)
Experience with MDM data pipelines (matching, deduplication, golden record logic)
Familiarity with ERP data models (NetSuite preferred) or clinical trial management systems (OnCore)
Experience with Snowflake (existing analytical layer we are integrating with)
Experience with Boomi or other iPaaS tools from a data engineering perspective
Background in healthcare, life sciences, or clinical research data
Experience building financial reconciliation or revenue recognition datasets

Sr. Data Engineer-Remote

Key skills

About this role

Responsibilities:

Requirements: