Develop and orchestrate multi-agent AI systems for automated test generation, test execution, and end-to-end development workflow optimization using frameworks like LangGraph, AutoGen, or the Anthropic Agent SDK (Claude Code)
Design and implement agentic workflows that coordinate multiple AI agents to autonomously drive test automation across UI, API, integration, and system levels, from test case synthesis to result evaluation, ensuring seamless integration with existing developer tools and MCP-compatible services
Build evaluation frameworks and custom benchmarks for agentic systems, including comparisons of AI agents against commercial solvers, using tools like AgentBench and Langfuse
Evaluate MCP server and tool performance across agentic pipelines, measuring latency, accuracy, context fidelity, and end-to-end task completion rates.

BS/MS in Computer Science, Machine Learning, or a related applied AI field
Expertise in Python and ML frameworks (PyTorch, Transformers, scikit-learn)
Experience with Large Language Models applied to software understanding or test generation
Knowledge of AI evaluation methodologies and metrics for agentic task completion and test quality
Strong foundation in statistical analysis and experimental design
Experience with developer workflow and productivity measurement frameworks
Background in software engineering or QA with close collaboration with development teams (preferred)
Familiarity with test automation frameworks (e.g., Playwright, Selenium, Pytest, Appium) and CI/CD pipelines (preferred)
Experience designing benchmarks that compare AI agents against commercial or domain-specific solvers (preferred)
Hands-on experience with MCP (Model Context Protocol), building, evaluating, and optimizing MCP servers and tool integrations within agentic pipelines (preferred)
Experience with agentic AI frameworks including LangGraph, AutoGen, or the Anthropic Agent SDK / Claude Code (preferred)
Knowledge in vision-language models or multi-modal AI for UI and system-level understanding and evaluation (preferred)
Experience with Azure AI Foundry/ML or AWS cloud ML platforms (preferred).

Software Developer – Agentic Evaluation

Key skills