Torc Robotics is a leader in autonomous driving technology focused on developing software for automated trucks. The Senior ML Engineer - Auto Tagger will be responsible for architecting and optimizing data pipelines, developing ML-assisted algorithms, and collaborating across teams to enhance data curation for autonomous trucking.
Responsibilities:
- Scenario Mining at Scale: Architect and optimize distributed data pipelines to process massive multi-sensor logs (camera, LiDAR, radar, kinematics), automatically extracting and cataloging safety-critical and long-tail driving events
- Advanced Event Tagging: Develop and tune both heuristic-based and ML-assisted algorithms (including exploring Vision-Language Models or semantic vector search) to automatically classify and describe complex environmental and behavioral scenarios
- Standardized Data Structuring: Extract and format scenario data utilizing the Pegasus layer standard (alongside opensource frameworks) to ensure semantic consistency and rigorous metadata integrity
- Data Flywheel Integration: Manage the ingestion of tagged events into the observations database, enabling high-speed querying and retrieval for ML training, regression testing, and system validation
- Cross-Functional Alignment: Operate with broad autonomy to drive consensus across organizational boundaries. Collaborate closely with downstream consumers in perception, simulation, and systems engineering to define what constitutes an "interesting scenario" and operationalize a continuous data loop
- Mentorship & Team Growth: Guide, mentor, and elevate less-experienced engineers. Lead design reviews, establish coding standards, and foster a culture of technical excellence and collaborative problem-solving
Requirements:
- BS or MS in Computer Science, Robotics, Engineering, or a STEM field, with 6+ years in data engineering, ML systems, or autonomous data curation
- Core Languages: Strong Python and SQL skills, with heavy experience processing massive time-series or unstructured datasets
- ML & Dataset Curation: Hands-on machine learning and dataset curation experience, with a demonstrated history of implementing targeted datasets that measurably improve downstream model performance
- Data Exploration: Hands-on experience using Databricks (or similar platforms) for large-scale analytics, interactive querying, and making massive vehicle datasets searchable
- Cloud & Compute: Expertise in distributed compute frameworks (Ray, Spark, Beam) and cloud platforms (AWS, GCP, or Azure) for executing heavy data workloads
- AV Standards: Experience parsing complex data formats and applying scenario-description standards like Pegasus layers
- Communication: Exceptional ability to translate complex data engineering challenges into clear strategies for cross-functional stakeholders
- Technical Leadership: Proven track record of mentoring teams, driving system architecture, and defining engineering roadmaps
- Auto-labeling & VLMs: Familiarity with foundational models, auto-labeling pipelines, or zero-shot classification for scenario extraction
- Model Serving: Experience with vLLM, SGLang, or similar frameworks for highly optimized, high-throughput model serving and inference
- Semantic Inference: Experience with semantic extraction and attribute mapping to help build out a robust semantic inference engine, moving beyond standard bounding-box object detection
- Data Tooling: Familiarity with parsing robotics formats (ROS bags, MCAP) and optimizing high-performance columnar storage formats (Parquet, Arrow)
- Downstream Integration: Knowledge of how scenario data feeds into generative simulation workflows, neural rendering, or sensor fusion validation
- Advanced Retrieval: Experience building semantic retrieval systems or vector databases for automotive data