Support agentic AI systems and orchestration, LLM application development, features and products, observability and reliability, and engineering excellence
Design and build core agentic workflows: multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems
Implement and evolve A2A communication patterns at the application layer, enabling agents to collaborate and hand off tasks, and build and maintain the tool-calling layer, including tool definitions, input and output schemas, error handling, retry logic, and result formatting
Own the MCP client-side integration, including how agents discover, invoke, and compose tools exposed via MCP servers
Design multi-agent workflows that are reliable, observable, and debuggable in production, not just in demos
Own LLM orchestration at the application layer, including prompt construction, context management, model selection logic, and response parsing
Build and maintain RAG features, including query formulation, result ranking, citation grounding, and hallucination mitigation; implement and iterate on prompt engineering patterns and system prompts that drive GRACE's quality and consistency across OpenAI GPT, Anthropic Claude, and Google Gemini
Manage context window budgets and know when to truncate, summarize, or paginate, and build the logic that makes those decisions correctly
Build evaluation pipelines for LLM quality, including grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes
Stay sharp on token economics and write prompts and pipelines that are cost-efficient without sacrificing output quality
Translate ambiguous product requirements into clear technical designs and ship them fast, build new product capabilities end-to-end, including from backend application logic through to the API contract the frontend consumes, and rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
Collaborate closely with product, UX, applied science, and operations, write tests, handle edge cases, and make sure features degrade gracefully when upstream dependencies fail
Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective, build fallback and guardrail logic for AI services, including what happens when a model returns something unsafe, off-topic, or structurally wrong, and work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
Write production-quality code: readable, tested, reviewed, and documented
Communicate technical decisions clearly to both engineers and non-engineers; no one should have to guess what you decided or why, participate actively in design reviews, and push back when something is over-engineered or under-specified
Ensure strong privacy, security, and compliance in all application logic and data handling
Requirements
7+ years of experience with software engineering, including building and operating production systems
Experience in high-velocity environments where you owned and shipped complex products end-to-end
Experience with at least 2 backend languages, including Python
Experience building and operating systems on major cloud platforms, such as AWS, GCP, or Azure
Experience with containerization and working within CI/CD pipelines
Knowledge of modern backend frameworks and async patterns
Knowledge of algorithms, data structures, APIs, and software design patterns
Bachelor's degree in Computer Science or Software Engineering
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Python
Benefits
health, life, disability, financial, and retirement benefits
paid leave
professional development
tuition assistance
work-life programs
dependent care
recognition awards program for exceptional performance