Cedar is a leading healthcare technology company focused on improving the healthcare system through data science and smart product design. They are seeking a Senior Data Engineer to design and own critical data pipelines, improve data quality, and contribute to the evolution of their data ecosystem.
Responsibilities:
- Design, build, and own scalable ELT/ETL pipelines that power core use cases including client billing, financial reporting, product analytics and data services for downstream teams (Finance, Data Science, Commercial Analytics, Product)
- Modernize legacy data flows by migrating SQL- and Liquibase-based transformations into dbt, with robust testing, documentation and data contracts
- Improve reliability and observability of our data platform by implementing best practices in testing, monitoring, alerting and runbook-driven operations for pipelines orchestrated via Airflow (and/or similar tools)
- Model data for usability and performance in Snowflake and other systems, applying dimensional and domain-driven design patterns where appropriate (e.g., for analytics core models and financial engineering services)
- Partner closely with product, finance, analytics and integrations teams to understand requirements, define interfaces, and ensure data is accurate, well-documented, and delivered in the right form and cadence for consumers
- Contribute to Cedar’s data platform vision by helping decouple data infrastructure from data services, establishing standards for governance, metadata, and access, and piloting tools like OpenMetadata and data quality frameworks
- Provide technical mentorship to other engineers, upleveling our data engineering practices in areas like code quality, reviews, architecture, and operational excellence
- Balance short-term delivery with long-term architecture, making pragmatic trade-offs while moving us toward a clear 'North Star' data platform that supports emerging use cases like AI/ML, personalization and experimentation
Requirements:
- 5+ years of hands-on data engineering (or closely related software engineering) experience, including ownership of production data pipelines and systems at scale
- Strong SQL and Python proficiency, with experience building data transformations, utilities and tooling (e.g., dbt models, Airflow DAGs, internal libraries)
- Deep experience with modern data stack tools, including several of: Snowflake (or similar cloud data warehouse), dbt, Airflow/Dagster (or similar orchestrator)
- Proven track record designing and operating reliable pipelines, including testing strategies (unit/integration/dbt tests), monitoring, alerting, and incident/root-cause analysis for data issues
- Experience with data modeling and schema design for analytics, reporting and operational use cases (e.g., dimensional models, entity-centric designs, or medallion-style architectures)
- Familiarity with cloud platforms, ideally AWS (e.g., use of S3, IAM, containerized workloads, or related infrastructure supporting data workloads)
- Strong collaboration and communication skills, with the ability to translate ambiguous business problems into clear technical requirements and to work effectively with partners across engineering, product and business teams
- High ownership and bias to action in complex, evolving environments—comfortable operating with partial information, making trade-offs explicit, and driving work to completion
- Experience with metadata and data governance tools, such as OpenMetadata, DataHub or similar catalogs, and implementing data contracts or quality frameworks (e.g., Great Expectations, dbt tests)
- Exposure to streaming and event-driven data pipelines (e.g., Kafka, CDC tools) and integrating those into warehouse-centric architectures
- Prior experience in healthcare, fintech, or other highly regulated domains, particularly with standards like HL7 or FHIR, or with complex billing/financial data flows
- Familiarity with analytics and visualization tools (e.g., Looker, Hex) and enabling self-serve analytics through well-designed semantic layers and models
- Experience helping define team-level standards, patterns, and roadmaps for data engineering or platform teams