Define and lead the end-to-end testing strategy for Outreach’s GenAI platform, including agentic workflows, LLM tool calls, LangGraph orchestration, and supporting ML pipelines.
Design and implement evaluation systems that handle both deterministic and non-deterministic outputs.
Own testing across Outreach’s suite of AI agents.
Work closely with Data Science, MLOps, and platform engineers to ensure testability is designed in from the start.
Integrate evaluation pipelines into CI/CD workflows.
Establish and track metrics that matter for AI systems: answer quality scores, tool invocation accuracy, hallucination rates, latency, and regression trends over model and prompt changes.
Define standards for AI testing across the org — including prompt regression testing, retrieval quality evaluation, and agent behavior contracts.
Raise the quality bar across engineering teams by mentoring engineers.
Actively track developments in AI evaluation tooling, LLM benchmarking, and testing research.
Requirements
7–12 years of experience in software development and/or test automation, with demonstrated experience leading quality efforts on complex, distributed systems.
B.S. in Computer Science or a related technical field.
Strong programming skills in Python, with experience writing reusable, maintainable test frameworks.
Proven experience testing large-scale backend or platform systems, including microservices and API layers.
Deep understanding of test design principles, CI/CD integration, and scalable test automation.
Experience with test frameworks such as PyTest or equivalent.
Solid understanding of evaluation methodologies for non-deterministic systems — including statistical assertions, behavioral testing, and regression baselines.
Hands-on experience with Databricks for building and validating ML pipelines and data workflows.
Experience with MLflow for experiment tracking, model versioning, and pipeline observability.
Strong communication and collaboration skills across engineering, data science, and product functions.
Tech Stack
Distributed Systems
Microservices
Python
Benefits
We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
New logo prospecting to expansions, deal acceleration, driving retention, and forecasting.
Outreach AI automates workflows and frees sellers to focus on more strategic conversations and actions.