ArteraAI is an AI startup focused on developing medical artificial intelligence tests to personalize therapy for cancer patients. As a Machine Learning Engineer, you will work on the AI Platform team to establish scalable pipelines for model training and data processing, collaborating closely with model developers and platform engineers.
Responsibilities:
- Accountable for Artera’s ML compute infrastructure including scaling up Artera’s Foundation Model development by developing distributed training infrastructure and developer libraries
- Build and evolve the core libraries used by AI scientists to develop, launch, and monitor AI products
- Work with model developers to optimize GPU and CPU efficiency and data throughput of large-scale foundation models and downstream model training runs
- Optimize Artera’s ability to store and serve terabytes of digital pathology data efficiently for the use in serving large-scale training regimes
- Ensure that Artera’s observability infrastructure provides a clear picture of how to continue to optimize performance across our model landscape
Requirements:
- 5+ years of industry software engineering experience
- 4+ years of industry experience using one of PyTorch, TensorFlow, or JAX in Python
- 3+ years of industry experience building with AWS, Docker, and Kubernetes
- 1+ years of industry experience optimizing large-scale, high data-throughput, distributed machine learning training pipelines
- This is a remote role open to candidates who are currently authorized to work either in the United States or in Canada without the need for current or future employment-based visa sponsorship
- Experience in using ML orchestration frameworks such as Flyte, Ray, Kubeflow, Metaflow, MLFlow, Dagster, Argo Workflow or Prefect
- Experience using Terraform, SqlAlchemy
- Experience in multi-node and multi-gpu training
- Experience deploying and maintaining infrastructure for machine learning training and production inference
- Familiarity with TorchScript, ONNXRuntime, DeepSpeed, AWS Neuron or similar approaches to inference optimization