Torc Robotics is a leader in autonomous driving technology, focusing on developing software for automated trucks. They are seeking a Senior Autonomy Data Engineer to design and operate the data infrastructure that supports their autonomy program, ensuring reliable data pipelines and effective collaboration with cross-functional teams.

Responsibilities:

Own the design and organization of the program’s data lake, including schema definitions, partitioning strategy and metadata indexing
Design and maintain end-to-end pipelines that ingest high-bandwidth sensor logs from vehicles into cloud storage with high reliability and tolerant of ad-hoc and intermittent connectivity mechanisms
Develop data validation and integrity checks that can detect corrupted information, missing sensors, and inconsistent calibration prior to the data being processed by downstream systems
Implement retention, tiering and lifecycle policies for data to balance storage costs with development value
Build tooling to query raw logs to produce curated training and evaluation datasets
Build automation to run cost-effective pseudo-labeling workflows at the scale of data ingest
Implement data quality and model performance metrics that are used to direct labeling effort toward the highest-value examples
Deploy and maintain data visualization tooling to support log review, annotation QA, and autonomy debugging workflows
Build integrations between the visualization tooling and the data lake so engineers can navigate from a dataset entry or model failure directly to the origin log data
Work with autonomy engineers to define and surface custom visualization panels and implement metrics for analyzing unstructured operating environments
Build dashboards that provide the autonomy engineers visibility into data coverage by terrain type, operating environment and geographic region
Establish and document data contracts between the data services and model training consumers
Partner with perception, planning and embedded engineers across the data lifecyle: from shaping the logging schemas and collection triggers to defining the dataset interfaces that supply model training and evaluation
Define data engineering standards, best practices, and tooling choices for an innovative and fast-paced team
Contribute to the data roadmap and provide input to technical leadership on investment priorities
Mentor junior engineers and raise the team’s capabilities in data infrastructure scalability and operational hygiene

Requirements:

Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Electrical Engineering or a related field with 6+ years of data engineering experience or a Master's with 4+ years
Strong proficiency in Python and SQL, with demonstrated ability to build production-quality data pipelines
Deep experience with cloud data infrastructure (AWS preferred: S3, Glue Athena, redshift, or equivalent) and infrastructure-as-code tools (Terraform, Cloud Formation)
Solid understanding of data partitioning strategies and columnar storage formats (Parquet, Orc, etc.)
Experience building and operating data pipelines that process time-series and binary data
Proven ability to evaluate and integrate open-source tooling when appropriate versus building from scratch
Strong instincts for delivering data quality through first-class implementations of monitoring, validation and lineage tracking
Experience with autonomous vehicles, robotics, or other sensor-driven autonomous systems
Deep experience with Foxglove or Rerun beyond basic playback, e.g. building custom extensions or integrating them into a structured log review or annotation QA workflow
Familiarity with the MCAP CLI and/or python library and experience converting MCAP data to columnar data formats for further querying and processing
Experience with data curation for ML training, e.g. diversity sampling, pseudo-labeling, and dataset versioning

Senior Autonomy Data Engineer

Key skills

About this role

Responsibilities:

Requirements: