Harnham is a scaling AI‑driven technology company building large‑scale, production‑grade ML systems used in real‑time decisioning. They are seeking a Senior Machine Learning Engineer who enjoys shaping platform architecture while being hands‑on, focusing on production ML with an emphasis on distributed compute and reliability.
Responsibilities:
- Designing, training, and deploying high‑scale ML models used in live systems
- Building distributed training pipelines (PyTorch, Ray)
- Owning the ML lifecycle across feature engineering, training, evaluation, inference, monitoring
- Improving ML reliability, observability, and reproducibility
- Working closely with engineering, SRE, and product to shape platform direction
- Contributing to ML architecture standards, CI/CD, and testing frameworks
Requirements:
- Strong experience delivering production ML systems end‑to‑end
- Expertise with Python, PyTorch, distributed compute (Ray, Spark)
- Background in large‑scale data processing and MLOps tooling
- Ability to diagnose production issues and drive architectural improvements
- Experience with event‑driven ML and model deployment frameworks is a plus