Cedar is a leading healthcare technology company focused on improving the healthcare system through data science and smart product design. The Senior Data Engineer will be responsible for building and maintaining scalable data pipelines, modernizing data flows, and ensuring data quality and observability, all while collaborating with cross-functional teams.
Responsibilities:
- Design, build, and own scalable ELT/ETL pipelines that power core use cases including client billing, financial reporting, product analytics and data services for downstream teams (Finance, Data Science, Commercial Analytics, Product)
- Modernize legacy data flows by migrating SQL- and Liquibase-based transformations into dbt, with robust testing, documentation and data contracts
- Improve reliability and observability of our data platform by implementing best practices in testing, monitoring, alerting and runbook-driven operations for pipelines orchestrated via Airflow (and/or similar tools)
- Model data for usability and performance in Snowflake and other systems, applying dimensional and domain-driven design patterns where appropriate (e.g., for analytics core models and financial engineering services)
- Partner closely with product, finance, analytics and integrations teams to understand requirements, define interfaces, and ensure data is accurate, well-documented, and delivered in the right form and cadence for consumers
- Contribute to Cedar’s data platform vision by helping decouple data infrastructure from data services, establishing standards for governance, metadata, and access, and piloting tools like OpenMetadata and data quality frameworks
- Provide technical mentorship to other engineers, upleveling our data engineering practices in areas like code quality, reviews, architecture, and operational excellence
- Balance short-term delivery with long-term architecture, making pragmatic trade-offs while moving us toward a clear 'North Star' data platform that supports emerging use cases like AI/ML, personalization and experimentation
Requirements:
- 5+ years of hands-on data engineering (or closely related software engineering) experience, including ownership of production data pipelines and systems at scale
- Strong SQL and Python proficiency, with experience building data transformations, utilities and tooling (e.g., dbt models, Airflow DAGs, internal libraries)
- Deep experience with modern data stack tools, including several of: Snowflake (or similar cloud data warehouse), dbt, Airflow/Dagster (or similar orchestrator)
- Proven track record designing and operating reliable pipelines, including testing strategies (unit/integration/dbt tests), monitoring, alerting, and incident/root-cause analysis for data issues
- Experience with data modeling and schema design for analytics, reporting and operational use cases (e.g., dimensional models, entity-centric designs, or medallion-style architectures)
- Familiarity with cloud platforms, ideally AWS (e.g., use of S3, IAM, containerized workloads, or related infrastructure supporting data workloads)
- Strong collaboration and communication skills, with the ability to translate ambiguous business problems into clear technical requirements and to work effectively with partners across engineering, product and business teams
- High ownership and bias to action in complex, evolving environments—comfortable operating with partial information, making trade-offs explicit, and driving work to completion
- Experience with metadata and data governance tools, such as OpenMetadata, DataHub or similar catalogs, and implementing data contracts or quality frameworks (e.g., Great Expectations, dbt tests)
- Exposure to streaming and event-driven data pipelines (e.g., Kafka, CDC tools) and integrating those into warehouse-centric architectures
- Prior experience in healthcare, fintech, or other highly regulated domains, particularly with standards like HL7 or FHIR, or with complex billing/financial data flows
- Familiarity with analytics and visualization tools (e.g., Looker, Hex) and enabling self-serve analytics through well-designed semantic layers and models
- Experience helping define team-level standards, patterns, and roadmaps for data engineering or platform teams