Role Overview
What you will do
- Collaborate on designing and implementing new data infrastructure and pipelines preparing data for large-scale ML workflows
- Care about data quality, and ensuring the pipelines you build are robust, scalable, and maintainable
- Work with DICOM data to feed into foundation model and disease-specific imaging model development
- Collaborate closely with Machine Learning Scientists, DevOps Engineers, and other Data Engineers to create a tight feedback loop and ensure the end-to-end process is effective and efficient
- Ensure that our data processes have quality and compliance designed in from the start to make reproducibility, lineage tracking, and data quality painless
- Scale pipelines to handle millions of scans – ingesting the imaging data, transforming it, filtering and structuring ready for foundation model development.
Requirements
What we need...
- Proven experience as a Data Engineer in complex, data-rich environments
- Strong programming skills in Python
- Experience building and maintaining production ML data pipelines, including orchestration tools such as Dagster and cloud infrastructure on AWS
- Experience with Docker and Kubernetes based infrastructure Experience working with large datasets
- Understanding of data preprocessing and quality control for machine learning
- Strong collaboration skills with machine learning or technical teams
Even better if you have experience of...
- Medical imaging data such as CT, MRI, or DICOM
- Large-scale datasets or foundation model workflows
- Deployment tooling (Helm and familiarity with Gitops tooling such as Flux and Kustomize)
- Data versioning and reproducibility frameworks
- Database design and data modelling
- Working in regulated or GxP or ISO 13485 environments
- Experience with ML experiment tracking or metadata management (MLFlow)
Tech Stack
- AWS
- Cloud
- Docker
- Flux
- Kubernetes
- Python
Benefits
- A comprehensive benefits package that includes an annual bonus plan, private medical insurance, life insurance, and a contributory pension scheme
- 25 days annual leave, plus bank holidays and enhanced maternity leave
- A diverse work environment that brings together experts in many fields, including software engineering, devops, data science, machine learning, quality assurance, regulatory affairs, and clinical operations.