Epoch AI is looking for a Software Engineer who will help us evaluate frontier AI models, enabling researchers, developers, and policymakers to better understand AI development. The role will involve running and maintaining our benchmarking infrastructure as well as contributing to the development of brand new benchmarks.

Responsibilities:

Implement benchmarks: Implement AI benchmarks within our evaluation infrastructure (primarily using the Inspect library) to expand the suite of capabilities we track. Develop our existing suite of benchmarks so we can quickly and painlessly evaluate new model releases
Develop new benchmarks: Contribute to the development of brand new benchmarks. You will have the opportunity to pitch and prototype your own ideas in addition to helping out with existing projects
Collaborate: Work closely with researchers, analysts, and other engineers at Epoch AI to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications

Requirements:

A strong software engineering background with more than two years of professional experience building and maintaining complex systems
Ability to regularly contribute high-quality, robust, and maintainable code
Comfortable diving deep into existing codebases and infrastructure
Ability to generate own ideas for new benchmarks, experiments, novel things to try, and other projects
Motivated by Epoch AI's mission to provide rigorous, independent insight into key trends in AI
Desire to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks
Professional level English proficiency
AI domain expertise or cybersecurity experience are strong pluses but not required
Hands-on experience running LLM evaluations
Familiarity with evaluation frameworks like Inspect
A solid grasp of current AI trends

Software Engineer, Benchmarking

Key skills

About this role

Responsibilities:

Requirements: