Operationalize machine learning models by building and maintaining robust, scalable pipelines for training, evaluation, deployment, and lifecycle management across cloud, on-prem, and edge compute environments
Work closely with autonomy researchers, software engineers, systems teams, and field operators to translate mission requirements into deployable ML capabilities
Implement automated CI/CD workflows tailored to ML systems, ensuring repeatable experiments, reliable packaging, and continuous delivery of both up to date models and associated data pipelines
Manage ML runtime infrastructure using containerization and orchestration frameworks (e.g., Docker, Kubernetes) and incorporating model serving platforms (e.g., Seldon, KServe, BentoML)
Develop monitoring systems to track model health, performance, data drift, system utilization, and mission relevance using tools such as Prometheus, Grafana, and ELK/EFK stacks
Ensure ML deployments meet defense, customer, and platform security requirements, with emphasis on data integrity, traceability, and operational reliability
Evaluate and integrate emerging MLOps, distributed training, and edge inference technologies to enhance reproducibility, extensibility, scalability, and deployment speed of ML systems
Requirements
Bachelor’s degree in Computer Science, Electrical Engineering, Data Science, or a related technical field (Master’s preferred)
5+ years of professional experience in software engineering, machine learning engineering, MLOps, or related roles
Experience operationalizing ML systems at production scale, including model training, versioning, packaging, deployment, and monitoring
Strong proficiency in Python and familiarity with at least one deep learning framework (e.g., PyTorch, TensorFlow)
Hands-on experience with MLOps frameworks and workflow tooling (e.g., MLflow, Kubeflow, Airflow, DVC, BentoML)
Experience deploying containerized ML services using Docker and orchestrating workloads using Kubernetes (including air-gapped or constrained deployments)
Understanding of CI/CD workflows and DevOps practices applied to ML systems (e.g., Git, Code Review, Metrics Evaluation)
Familiarity with monitoring, observability, and logging platforms (e.g., Prometheus, Grafana, ELK/EFK)
Ability to obtain and maintain U.S. Government security clearance (U.S. Citizenship required)
Ability to travel up to 20%
Tech Stack
Airflow
Cloud
Docker
Grafana
Kubernetes
Prometheus
Python
PyTorch
Tensorflow
Benefits
Competitive salary
Equity
Comprehensive benefits package
401k with a 5% company match
Paid holidays and generous paid time off offering
Paid leave programs
Patent bonus program
Employee referral bonus program
Learning and development program
Opportunity to work with a team of highly skilled, creative and motivated team members