Cedar is a leading healthcare technology company focused on improving the healthcare system through data science and smart product design. The Principal Data Architect & Engineer will lead the development of a new enterprise data platform, ensuring scalability and reliability while mentoring a team of engineers.
Responsibilities:
- Architect the Future-State Data Model & Storage: Lead the design and execution of a new, high-scale data model and storage architecture that lives outside our legacy monolith. You will drive decisions on service boundaries, data ownership, and storage patterns—implementing Medallion architecture (Bronze, Silver, Gold layers) to ensure progressive data quality refinement—while defining the API contracts that will anchor Cedar’s data strategy for years to come. This includes real-time event-driven pipelines and streaming architectures alongside batch analytics
- Ensure Metrics Reliability: Own the accuracy and reliability of Cedar’s core business metrics including collection rate, days in AR, and invoice balance tracking. Design validation frameworks that ensure our source of truth remains consistent across all reporting and product surfaces
- Power Intelligent Agents: Build the real-time data foundations that power Cedar’s AI-driven products, ensuring agents, like Kora, have the context, entity graphs, and metrics they need to act autonomously in revenue cycle workflows and transform the patient experience
- Design for the Future: Design data models for new product lines that expand Cedar’s presence across the full revenue cycle from pre-service through post-service collections
- Build Production Code: You are a hands-on builder. You will write, ship, and maintain high-quality production code alongside our team of Data Engineers. You will personally model critical domains and implement the core patterns that the rest of the team will follow
- Guide and Mentor Engineers: Act as the technical role model for a team of Data Engineers. You will raise the bar for engineering excellence through rigorous code reviews, design guidance, and technical mentorship, turning every challenge into a growth opportunity for the team
- Collaborate: Drive technical alignment across data engineering, platform, and product teams, set priorities, unblock dependencies, and ensure delivery against customer-facing commitments
- Steward Data Integrity: Define and enforce standards for Cedar's enterprise Data Dictionary and metadata strategy. You will partner with engineering and product to ensure data is accurate, discoverable, and synchronized across all environments
Requirements:
- 10+ years of experience in data engineering and backend systems at scale
- Experience leading large technical projects with a high accountability mentality, focused on delivering value
- Proven track record of shipping production enterprise systems that customers depend on
- Deep, hands-on proficiency with a wide range of data and engineering technologies and patterns such as Snowflake, dbt, Liquibase, Fivetran, Airflow, OpenMetadata, Kafka, SQL, Python, Kafka, streaming/event driven architectures, CDC (Change Data Capture), and real-time data processing
- Experience building production-grade ELT/ETL pipelines and managed complex data transformations in cloud-first environments
- Experience building 'source of truth' systems and a passion for data accuracy
- Understanding how to define and audit key business metrics to ensure they remain resilient to system changes
- Experience with data validation and reconciliation, comparing pipeline output against source systems and methodically diagnosing discrepancies
- Experience building data APIs, context systems, or feature stores that power downstream ML/AI applications
- Ability to spot over-engineering and under-engineering with equal confidence
- Knowledge of how to design for resilience, modularity, and backwards compatibility
- Preference for incremental milestones over big bang deliverables
- Ability to translate complex technical risks and trade-offs into business-relevant terms for senior stakeholders and cross-functional partners
- Avid learner in general, and an adopter of AI engineering tools like Claude Code and Codex
- Experience directing AI agents to generate complete, high-quality solutions and apply judgment to test and own the output
- Demonstrable understanding of healthcare concepts, including revenue cycle management (RCM), billing, and insurance adjudication