Ellipsis Health is creating cutting-edge AI/ML products that solve healthcare staffing issues and administrative burdens using conversation-based software and patented voice biomarker technology. They are seeking an experienced Senior Data Platform Engineer to lead the design and development of a scalable data platform that supports analytics and ML Ops while collaborating with various teams to implement end-to-end pipelines.
Responsibilities:
- Lead the design, development, and operation of a scalable and secure data platform to support analytics, ML Ops, and business intelligence
- Collaborate closely with Data Science, Machine Learning, Application and DevOps teams to implement end-to-end ML Ops pipelines
- Architect and manage data warehousing solutions using Databricks, Dbt, and Spark
- Develop and maintain ETL/data pipelines that handle structured and unstructured data across diverse sources
- Optimize data storage, access, and processing for cost-efficiency and performance in GCP and AWS Cloud environments
- Build and maintain dashboards and analytics solutions using tools such as Sigma, Metabase, and other BI platforms
- Ensure compliance with data governance, security, and privacy best practices, including HIPAA, SOC-2, and other regulatory requirements
- Evaluate and integrate third-party anonymization and security solutions to protect sensitive data
- Provide strategic guidance on the evolution of the data platform to meet the company's growth and technical needs
- Design and implement scalable infrastructure for Large Language Model (LLM) operations, including training, fine-tuning, and inference workflows
- Collaborate with AI/ML teams to build and optimize LLM serving platforms for real-time and batch processing
- Develop monitoring and observability solutions for LLMs, ensuring model performance, cost-efficiency, and compliance with ethical AI guidelines
- Evaluate and integrate state-of-the-art LLM technologies into existing data platforms to enhance analytics and decision-making
Requirements:
- Bachelor's or Master's Degree in Computer Science or equivalent experience
- 5+ years of industry experience in designing and building large-scale data platforms
- Strong expertise in SQL, Data Modeling, and Data Warehousing (Databricks, Snowflake, Redshift, BigQuery, etc.)
- Proficiency in writing Advanced SQLs and performance tuning
- Strong proficiency in Python for building, optimizing, automating and maintaining data pipelines and services
- Deep experience with Apache Spark and distributed data processing frameworks
- Hands-on experience with modern ETL/Orchestration frameworks such as Airflow, dbt, and others
- Knowledge of business intelligence tools such as Sigma, Metabase, Tableau, and Looker
- Strong familiarity with cloud-based infrastructure and managed data services in GCP and AWS Cloud
- Experience with CI/CD pipelines to automate testing, deployment and release of data engineering and analytics workflows using GitLab, GitHub etc
- Experience with tools like Kubernetes, Terraform, Pubsub, Debezium
- Exposure building data quality frameworks and automation
- Understanding of data governance, privacy, and regulatory frameworks (HIPAA, SOC-2, HITRUST)
- Experience working with ML Ops platforms and supporting Data Science teams
- Experience with ML Ops tools such as MLflow, Streamlit, and vector databases
- Familiarity with healthcare data standards (FHIR, HL7)
- Experience in real-time data processing and event-driven architectures
- Expertise in implementing data access controls and anonymization techniques