Komodo Health is on a mission to reduce the global burden of disease through smarter use of data. The Senior Data Engineer will play a critical role in shaping core data products that power the Healthcare Map by designing and optimizing large-scale data pipelines to improve health outcomes across the industry.
Responsibilities:
- Build, operate, and optimize large-scale production data pipelines using Python, SQL, Airflow, cloud infrastructure, and distributed processing frameworks — including robust data quality checks, validation, lineage, observability, monitoring, and alerting
- Design and scale agentic data acquisition and extraction systems for complex, unstructured public data sources; develop LLM-powered Human-in-the-Loop (HITL) pipelines for data extraction and curation
- Transform healthcare claims, EHR, non-claims-based, and reference datasets into trusted, performant Healthcare Map data products and serving-ready data assets
- Contribute to system design, architecture, code quality, testing, documentation, CI/CD, and rotational production support — including debugging complex data, system, and performance issues across computationally intensive workflows
- Partner with Data Product Quality, Product, Platform, and Engineering teams to translate healthcare data needs into scalable technical solutions that enable downstream analytics, product, and AI/ML use cases
Requirements:
- Strong hands-on experience building, operating, and debugging production-grade data pipelines at scale in AWS, with sharp instincts for data quality, reliability, root-cause analysis, and production troubleshooting
- Advanced Python and SQL skills; experience with Airflow or similar orchestration tools and Spark or comparable distributed processing frameworks
- Ability to communicate technical trade-offs clearly and collaborate across engineering, product, and data teams
- Comfort using AI-assisted engineering tools for productivity, debugging, documentation, and technical exploration
- Healthcare data experience is a plus, but not required
- Ability to optimize high-scale data architectures for performance, cost, versioning, and large-volume productization; experience applying AI or agentic workflows to engineering, data quality, delivery, or operations
- Proven success in high-growth or ambiguous environments that require balancing architecture, speed, and quality