Vals AI is seeking exceptional researchers and research engineers to design and build the next generation of AI benchmarks. The role involves creating evaluations that assess the capabilities of foundation models and working closely with major labs and enterprises to define standards for LLM evaluation.

Responsibilities:

Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities
Conduct research to ensure our benchmarks are valid, reliable, and meaningful
Collaborate with foundation model labs and enterprises to understand evaluation needs
Analyze model performance across benchmarks and communicate findings
Publish research findings and contribute to the broader evaluation research community
Work closely with the infrastructure team to implement your benchmark designs at scale
Stay current with the latest developments in LLM capabilities and evaluation methodologies

Member of Technical Staff- Research

Key skills

About this role

Responsibilities: