Define the target-state MLOps architecture for Elliptic, covering model training pipelines, serving infrastructure, monitoring, feature management, and governance, and produce the architecture decision records that inform investment decisions
Make and document build-vs-buy-vs-stop recommendations with clear cost modelling and trade-off analysis, evaluating vendors, open-source tools, and managed services against Elliptic's constraints (AWS-primary, Databricks ecosystem)
Work with InfoSec to improve the existing model registry and model risk management framework, closing identified gaps in metadata, lineage, approval workflows, and drift/bias detection
Build model training pipelines, CI/CD for ML, and serving infrastructure, working directly with a small group of infrastructure engineers to ship production-grade platform capabilities
Instrument observability across the ML lifecycle: training metrics, serving latency and throughput, data quality, and prediction drift, integrating with Elliptic's existing observability stack
Work directly with data scientists and ML engineers across all four consumer groups to onboard them onto the platform, writing documentation, runbooks, and reference architectures that lower the barrier to self-service
Requirements
Have built MLOps platforms or ML infrastructure from the ground up, and can speak to what worked, what didn't, and why
Have operated in a regulated industry (e.g. compliance, financial) and have hands on experience building ML infrastructure to meet those regulatory demands
Think about ML infrastructure the way the best platform engineers think about data infrastructure: as a set of foundations with internal customers whose needs must be understood and balanced
Are comfortable operating in ambiguity, making decisions with incomplete information, and creating structure where none exists, while remaining open to changing course when better information arrives
Influence through clarity, evidence, and the quality of your work rather than positional authority. You earn adoption by making the platform genuinely better than the alternative
Care about production engineering quality: you write production-grade code, your systems are tested, observable, documented, and designed for others to operate
Deep hands-on experience building MLOps platforms, including model registries, feature stores, and ML pipeline orchestration
Working knowledge of model serving patterns: real-time inference, batch prediction, A/B deployment, and deployment strategies
AWS infrastructure experience (ECS/EKS, S3, IAM, networking) and comfort operating in a Databricks ecosystem or equivalent lakehouse architecture
Experience with model monitoring: model evaluation, data drift detection, prediction drift, and performance degradation alerting
A track record of building something from zero and bringing it to a state where others could operate and extend it
Experience in a regulated industry (fintech, financial services, healthcare) where model governance is a compliance requirement
See AI as a core part of how modern engineering gets done, not a passing trend. You actively use it to think faster, prototype faster, and pressure-test your own designs, and you're excited that the bar keeps rising.
Prior experience running formal build-vs-buy evaluations with written decision records
Tech Stack
AWS
Benefits
Hybrid working and the option to work from almost anywhere for up to 90 days per year
£500 Remote working budget to set up your home office space
$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
Holidays: 25 days of annual leave + bank holidays
An extra day for your birthday
Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
Private Health Insurance
we use Vitality!
Full access to Spill Mental Health Support
Life Assurance: we hope you will never need this
but our cover is for 4 times your salary to your beneficiaries