SandboxAQ is a high-growth company delivering AI solutions that address some of the world's greatest challenges. The AQNav team is looking for a highly-accomplished Data Engineer to help build infrastructure that empowers the team with data and accelerates their models.
Responsibilities:
- Data Pipeline Development & Maintenance: Work across a mixed-maturity pipeline environment
- Data Modeling: Build and optimize data models that serve a diverse set of consumers. You'll make the data accessible and trustworthy, not just available
- Simulation Data Integration: Work within the in-house simulation suite to add data-capturing capabilities and ensure simulation outputs feed cleanly into downstream pipelines alongside real-world field data
- Data Quality & Observability: Instrument pipelines with quality checks, anomaly detection, and alerting so issues surface early
- Cross-Functional Data Support: Translate ambiguous asks into well-defined requirements, repeatable datasets and lightweight Dashboards that the team can use independently going forward
- Data Platform Infrastructure Contribution: Improve the features and reliability of our internal data platform over time
- Documentation: Own the technical documentation for pipelines, data models, and schemas you touch. In a team this cross-functional, good documentation is a force multiplier
Requirements:
- US citizenship (required for working with CUI data)
- 3+ years of industry experience as a Data Engineer in a startup or fast-moving environment
- Strong proficiency in Python and SQL, with hands-on experience building production-grade data solutions
- Experience designing and maintaining data pipelines and data models/warehouses that process large, structured scientific or engineering datasets
- Hands-on experience building on AWS (e.g., S3, ECS, Lambda, IAM) combined with CI/CD and containerization (e.g., GitHub Actions or CircleCI, Docker) to automate, deploy, and maintain data and ML workloads in the cloud
- Practical MLOps experience: setting up and operating MLOps frameworks (e.g., MLFlow, DVC)
- A Master's or Ph.D. in a specialized technical field like computer science, data science, mathematics, etc
- Experience working with sensor data (100-1KHz range)
- Ability to build interactive dashboards in Hex or similar
- Experience working with standard ML libraries like PyTorch, scikit-learn and basic supervised/ unsupervised learning techniques