Prototype and train learning-based models using a data-centric approach, applying techniques such as automated feature engineering, active learning, and fine-tuning on curated datasets
Design, develop, and maintain efficient data and feature extraction pipelines to support ML engineers in accessing high-quality data for model training
Design auto labeling system using ensemble of models that can reason from multimodal data for different use-cases, including image semantic labeling using vision grounded models, intent and path prediction ground truth
Perform complex data extraction, transformation, and loading (ETL) processes, ensuring data is clean, accessible, and well-documented
Write and optimize high quality SQL queries for data analysis and ingestion from various sources
Partner with data infrastructure and ML engineers to ensure seamless integration of data and machine learning workflows
Produce highquality, maintainable code and participate in peer code reviews to share knowledge and uphold team standards
Requirements
Bachelor’s Degree or U.S. equivalent in Computer Science, Data Science, or a related field
5 years of professional experience as a Data Scientist, Machine Learning Engineer, Data Engineer, or any occupation, job title, or position performing software engineering and machine learning.
5 years of professional experience utilizing SQL to write and optimize complex queries for extraction, analysis, and ingestion of structured, semi-structured, and unstructured data
5 years of professional experience utilizing machine-learning frameworks (including TensorFlow and PyTorch)
5 years of professional experience designing and developing data and feature extraction pipelines, including pipelines for multi-modal data (including images, point clouds, or time-series)
5 years of professional experience training and prototyping machine-learning models using data-centric techniques including automated feature engineering, active learning, and fine-tuning
5 years of professional experience utilizing cloud platforms including AWS, GCP, or Azure
5 years of professional experience utilizing containerization and workflow tools including Docker, Kubernetes, or Airflow
5 years of professional experience collaborating with cross-functional engineering teams to integrate data pipelines and ML workflows
3 years of professional experience programming in Python, building scalable data pipelines, or implementing ETL workflows
1 year of professional experience working with relational databases and SQL (including Postgres, Redshift, or SQL Server)