Nexaminds is on a mission to redefine industries with AI, focusing on innovation and collaboration. They are seeking a Senior Data Engineer to lead the development, optimization, and scaling of data solutions, particularly using Databricks in a fast-paced environment.
Responsibilities:
- Design and build the core reusable ingestion engine in Python and ADF — parameterised, config-driven, zero hardcoding
- Build Python ingestion modules: file readers, schema validators, format handlers (CSV, EDI X12, FHIR R4, Parquet, JSON)
- Implement PySpark / Scala transformation components for batch and streaming at scale on Azure Databricks
- Write config-driven SQL data models for Bronze, Silver, Gold medallion transformations
- Develop metadata-driven validation layer: null checks, type enforcement, range rules, referential integrity
- Build reusable utility libraries: logging, error handling, retry logic, dead-letter routing
- Implement Databricks notebooks and DLT (Delta Live Tables) pipelines for declarative transformations
- Build and maintain the onboarding template library v1 and v2 — parameterised, documented, production-ready
- Onboard Provider, Claims, Member, Eligibility, and Reference data domains using the framework
- Write unit tests, integration tests, and data contract tests (pytest, Great Expectations or equivalent)
- Optimise Spark jobs: partitioning, caching, broadcast joins, Z-ordering on Delta tables
- Participate in code review, follow GitHub branching standards, and contribute to documentation
Requirements:
- 5+ years Data Engineering in production Azure environments — Python, SQL, Spark
- Python: production-grade OOP, config-driven design, no hardcoding, type annotations
- PySpark / Spark: DataFrames, schema enforcement, partitioning, performance tuning
- SQL: advanced window functions, CTEs, incremental load patterns, Delta Lake DML (MERGE, UPDATE, DELETE)
- Azure Data Factory: parameterised pipelines, linked services, triggers, IR configuration
- Azure Databricks: notebooks, Jobs API, DLT, cluster configuration, Unity Catalog access
- ADLS Gen2, Delta Lake / Parquet format, Medallion store patterns
- Testing discipline: pytest, unit and integration tests, data quality assertions
- Git: feature branching, PR workflow, commit discipline, code review
- Scala: Spark Dataset API, typed transformations, sbt build tooling
- Healthcare data formats: EDI X12 (837/835/834), FHIR R4 resource parsing
- Delta Lake: schema evolution, time travel, OPTIMIZE, VACUUM, Z-ordering
- dbt (data build tool) for SQL transformation layering and lineage documentation
- Databricks Asset Bundles (DABs) for pipeline-as-code deployment
- DP-203 Azure Data Engineer Associate certification