Design and ship production AI systems — multi-agent orchestration, routing, and specialized agents that take a request and carry it through to a reliable outcome.
Automate manual operational work across onboarding, support, exceptions, and document/data understanding — turning processes that take hours or days into seconds.
Build the models behind the decisions — forecasting, prediction, matching/allocation, optimization, and reliability scoring that ground the product in data instead of guesswork, exposed as services the agent layer can call.
Design the learning loop. Instrument decisions and their outcomes so models continuously improve, with the data and evaluation infrastructure to support it.
Own reliability and evaluation. Build the eval harnesses, tracing, observability, and guardrails for complex AI workflows where mistakes carry real operational and financial consequences — and prove a model or agent beats the status quo before it ships.
Make the build-vs-rules calls — know when a model genuinely wins, when an agent is the right tool, and when a simple rule is the smarter answer.
Raise the bar and help the team grow — push our prototyping-to-production pipeline forward and mentor engineers as the AI team scales.
Requirements
You've shipped production AI/ML, not just prototypes — and dealt with the real tradeoffs of edge cases, quality, latency, cost, and reliability.
You have real depth on at least one of these, and working fluency across both:
Generative / agentic AI — multi-agent orchestration, tool/function calling, RAG, structured outputs, and the modern stack (e.g., LangGraph/LangChain, MCP), across providers (Amazon Bedrock, Azure OpenAI, Anthropic, OpenAI).
Applied ML / decision intelligence — forecasting, optimization, matching/allocation, ranking, or prediction models that drive operational decisions with measurable business impact.
You design and trust your own evaluation — offline and online, tied to business outcomes, with safe rollout (e.g., shadow mode) and drift monitoring.
You're deeply hands-on and ship fast — strong in Python, modern API/services (e.g., FastAPI), and sound ML-systems and architecture instincts.
You've built for operationally complex or high-stakes environments where quality and reliability genuinely matter.
You communicate clearly, make decisions quickly, and can lead technical work without needing heavy process.
Bonus points:
Background in logistics, supply chain, transportation, marketplaces, mobility, or fulfillment.
Operations research / optimization, or reinforcement learning / bandits for sequential decision-making.
Multimodal / document understanding, computer-use, or browser automation.
Real-time / streaming systems, feature stores, and production MLOps at scale.
Patents or peer-reviewed publications, or experience as an early/founding engineer.
Tech Stack
Azure
Python
Benefits
Competitive salary, stock options, and performance-based bonuses
Fully remote
Comprehensive medical, vision, and dental insurance