1Phi Health is a health tech startup focused on making healthcare more accessible. They are seeking a New Grad Data Engineer to build and maintain data pipelines, ensuring data quality and collaborating with data scientists and product engineers in the healthcare data domain.

Responsibilities:

Build and maintain data pipelines that ingest, transform, and validate large-scale Medicare claims data using SQL, Python, and Databricks (Spark). You'll work with patient-level records across billions of claim lines
Write and optimize complex SQL — multi-step transformations, window functions, joins across large datasets, aggregations with suppression rules. SQL is the primary language of the work
Automate and operationalize recurring data workflows — building reliable, repeatable pipelines that process CMS data extracts, dimension tables, and derived provider metrics
Ensure data quality by designing validation checks, reconciling source data against expected schemas, and investigating anomalies when numbers don't add up
Collaborate with data scientists and product engineers to define output schemas, deliver clean datasets, and support downstream analytics and application features
Work in cloud infrastructure — primarily Databricks on AWS, with exposure to S3, Unity Catalog, and related services
Learn the healthcare data domain — you'll develop working knowledge of claims data structures, medical coding systems (ICD-10, HCPCS, DRG), and CMS data programs

Requirements:

Build and maintain data pipelines that ingest, transform, and validate large-scale Medicare claims data using SQL, Python, and Databricks (Spark). You'll work with patient-level records across billions of claim lines
Write and optimize complex SQL — multi-step transformations, window functions, joins across large datasets, aggregations with suppression rules. SQL is the primary language of the work
Automate and operationalize recurring data workflows — building reliable, repeatable pipelines that process CMS data extracts, dimension tables, and derived provider metrics
Ensure data quality by designing validation checks, reconciling source data against expected schemas, and investigating anomalies when numbers don't add up
Collaborate with data scientists and product engineers to define output schemas, deliver clean datasets, and support downstream analytics and application features
Work in cloud infrastructure — primarily Databricks on AWS, with exposure to S3, Unity Catalog, and related services
Learn the healthcare data domain — you'll develop working knowledge of claims data structures, medical coding systems (ICD-10, HCPCS, DRG), and CMS data programs
You have strong SQL skills. Coursework, internships, or projects where you wrote non-trivial queries — joins, CTEs, window functions, aggregations. You can reason about query performance
You're comfortable with Python. You've used it for data manipulation (pandas, PySpark, or similar). You don't need to be a software engineer, but you can write clean, functional code
You understand data pipeline concepts — ETL/ELT, idempotency, schema management, data validation. Exposure through coursework, capstone projects, or internships counts
You're detail-oriented and methodical. Healthcare data has strict rules around suppression, privacy, and accuracy. You care about getting the numbers right
You're a fast learner who's comfortable ramping up on unfamiliar domains. You'll be learning Medicare claims data, CMS programs, and healthcare coding systems on the job
You have a BS or MS in Computer Science, Data Science, Information Systems, Statistics, or a related field
You've worked with Spark, Databricks, or other distributed compute environments (even in a class or personal project)
You have exposure to cloud platforms (AWS, GCP, or Azure) — S3, IAM, or managed database services
You've touched healthcare data in any capacity — claims, EHR, public health datasets, MIMIC, CMS public use files
You're familiar with version control (Git) and collaborative development workflows
You've built a data project end-to-end — ingestion through delivery — even if it was small

New Grad Data Engineer (for Health Tech Startup)🤓

Key skills

About this role

Responsibilities:

Requirements: