Boson AI is pioneering the future of enterprise AI, focusing on cutting-edge AI research and solutions. The role involves engineering and evolving the core Agent OS, which includes managing the dialog and policy engine, distributed context and memory, and complex agentic orchestration frameworks.
Responsibilities:
- System Ownership: Take ownership of the core dialog & policy engine. Define and implement the state machine for agent state representation, the decision-making logic, and the mechanisms for enforcing complex safety policies and guardrails at the execution layer of a workflow
- Distributed Context & Memory: Design, implement, and maintain the high-performance context and memory systems. Focus on low-latency, reliable access to conversational and user history, including the tight integration and optimization of RAG and vector retrieval pipelines for production use
- Agentic Orchestration Frameworks: Define, architect, and deliver robust agentic orchestration patterns, including battle-tested planner–executor schemes, ReAct-style reasoning and acting loops, and resilient, multi-step workflows that programmatically combine tools, LLMs, and stateful memory
- Internal SDK/Framework Development: Build and evolve the internal, production-grade equivalent of frameworks like LangChain/LlamaIndex. Design composable graphs and execution chains with clear APIs and type safety that product engineering teams and low-code builders can safely reuse, extend, and deploy at scale
- Voice Runtime Infrastructure: Own and optimize the voice runtime components for streaming audio, low-latency barge-in detection, and reliable turn-taking protocols. This requires deep collaboration with Application and ML Platform teams to meet tight latency, jitter, and quality of service (QoS) constraints
- Tooling & Integration Architecture: Architect a robust, secure tooling and integration framework (MCP/A2A). This includes building the underlying infrastructure for tool registration, handling complex authentication/authorization, implementing rate limiting/circuit breaking, managing retries, and ensuring typed, validated I/O between agents and external microservices
- Platform Observability & Reliability: Define, instrument, and monitor rigorous SLIs/SLOs for the Agent Platform. Lead engineering efforts to continuously improve reliability, enhance system debuggability (rich, step-level traces and structured logging), and drive core performance optimizations over time
- API & Abstraction Design: Ensure the platform's public-facing APIs and internal abstractions are clear, well-documented, and fundamentally sound, enabling junior and senior engineers alike to compose sophisticated agent behavior without introducing systemic invariants or breaking changes
- Advanced Capabilities R&D: Explore and prototype future capabilities, focusing on the engineering challenges of on-device personalization, implementing privacy-preserving federated learning signals, or integrating novel policy adaptation techniques that influence agent behavior in production