Valiant Harbor International is seeking a Software Development Engineer III to support the Director’s Office at the Advanced Research Projects Agency for Health. The role involves building agentic AI systems and facilitating LLM application development, while managing features and ensuring reliability in collaboration with internal and external partners.
Responsibilities:
- Design and build agentic AI systems and orchestration:
- Design and build GRACE's core agentic workflows (e.g., multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems)
- Implement and evolve A2A communication patterns at the application layer, enabling GRACE agents to collaborate and hand off tasks
- Build and maintain the tool-calling layer (tool definitions, input/output schemas, error handling, retry logic, and result formatting)
- Manage the MCP client-side integration
- Design multi-agent workflows that are reliable, observable, and debuggable in production
- Facilitate LLM application development:
- Own LLM orchestration at the application layer (prompt construction, context management, model selection logic, and response parsing)
- Build and maintain RAG features (query formulation, result ranking, citation grounding, and hallucination mitigation)
- Implement and iterate on prompt engineering patterns and system prompts across OpenAI GPT, Anthropic Claude, and Google Gemini
- Manage context window budgets (truncate, summarize, paginate, etc.) and build the logic that makes those decisions correctly
- Build evaluation pipelines for LLM quality (grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes)
- Manage prompts and pipelines that are cost-efficient without sacrificing output quality
- Manage features and products:
- Translate ambiguous product requirements into clear technical designs for fast shipment
- Build new GRACE capabilities end-to-end (from backend application logic through to the API contract the frontend)
- Rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
- Perform oversight and quality assessments; write tests, handle edge cases, and make sure your features degrade gracefully when upstream dependencies fail
- Manage reliability and collaboration with internal/external partners:
- Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
- Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective
- Build fallback and guardrail logic for AI services
- Write production-quality code: readable, tested, reviewed, and documented
- Work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
- Participate actively in design reviews, mentor other engineers, communicate technical decision clearly to both engineers and non-engineers