Miraxis AI builds AI-assisted data generation systems for Physical AI and robotics. They are seeking a Robotics Research Engineer, World Models & Synthetic Data to develop representation, simulation, and transition-prediction layers for their robotics data platform, focusing on egocentric video understanding and synthetic data generation.
Responsibilities:
- Build world-model and transition-evidence pipelines for robotics video and demonstration data
- Represent physical claims as state-before, claimed interaction, expected state-after, observed state-after, evidence, uncertainty, and provenance
- Integrate video representation models such as V-JEPA-style models, VideoMAE, DINO-style encoders, or other frozen foundation models into reproducible inference pipelines
- Build latent residual scoring over video and multimodal representations
- Design lightweight transition predictors, probes, or calibrated residual models over frozen embeddings
- Build digital twin workflows for selected robotic tasks, scenes, objects, environments, and failure modes
- Use simulation and digital twins to generate controlled variations: object pose, lighting, occlusion, camera viewpoint, clutter, distractors, contact events, action perturbations, and rare failures
- Evaluate synthetic and simulated data for usefulness, not just visual realism
- Compare simulated or synthetic data interventions against real-world review outcomes and downstream evaluation metrics
- Build pilot reports on routing and synthetic-data value, including Lift@k, correction capture rate, confidence intervals, calibration, and failure analysis
- Work with computer vision engineers to incorporate masks, tracks, dense visual features, clip embeddings, object state, and scene structure into the transition-evidence layer
- Work with platform engineers to ensure outputs are versioned, reproducible, traceable, and stored as auditable artifacts
- Support post-pilot model development using human correction traces, model disagreements, deterministic validation failures, simulation deltas, and final accepted annotations
- Keep world-model and simulation outputs as evidence, stress-test signals, and data-generation aids. They should not automatically finalize annotations or replace human review for high-risk cases
Requirements:
- Strong Python and PyTorch experience
- Hands-on experience with video models, robotics perception, simulation, synthetic data, self-supervised representation learning, temporal action recognition, action anticipation, or latent prediction
- Experience working with frozen foundation models and building task-specific probes, heads, residual models, or scoring functions on top of embeddings
- Strong understanding of video representation evaluation, uncertainty, calibration, weak signals, and human-in-the-loop ML
- Experience with simulation or digital twin workflows for robotics, perception, physical scenes, or embodied AI
- Experience with GPU inference, batch processing, experiment tracking, artifact versioning, and reproducible evaluation
- Ability to design evaluation plans that avoid overfitting small, biased, or noisy pilot datasets
- Comfort working with egocentric and robotics video failure modes, including occlusion, hidden hands, camera motion, fast motion, object confusion, temporal aliasing, weak labels, ambiguous contact, and sim-to-real gaps
- Strong statistical judgment, including practical use of Lift@k, AUROC, AUPRC, calibration curves, Brier score, confidence intervals, and bootstrapping
- Experience with V-JEPA, JEPA-style architectures, VideoMAE, DINO, DINO-WM, latent world models, action-conditioned prediction, EPIC-KITCHENS, Ego4D, BridgeData, DROID, RoboCasa, ManiSkill, or other robotics video datasets
- Experience with digital twins, robotics simulation, or synthetic data generation using tools such as NVIDIA Isaac Sim, Omniverse, MuJoCo, Genesis, Unreal, Unity, Blender, RoboSuite, or similar
- Experience with domain randomization, procedural scene generation, sim-to-real validation, synthetic data filtering, or counterfactual data generation
- Experience with Physical AI, robotics perception, embodied datasets, human demonstration data, temporal segmentation, or state-change detection
- Experience building active-learning, uncertainty-ranking, review-routing, or human-review prioritization systems
- Experience evaluating model outputs against human corrections, operational review outcomes, or downstream robotics performance
- Experience with video artifact storage, vector search, MCAP, Rerun, ROS/ROS 2, or similar tools for multimodal data inspection
- Experience working with 3D assets, object state, scene graphs, camera calibration, pose estimation, or spatial representations