Vals AI is seeking exceptional researchers and research engineers to design and build the next generation of AI benchmarks. The role involves leading the design and development of novel benchmarks that assess real-world capabilities of LLMs and influencing how AI systems are evaluated.

Responsibilities:

Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities
Conduct research to ensure our benchmarks are valid, reliable, and meaningful
Collaborate with foundation model labs and enterprises to understand evaluation needs
Analyze model performance across benchmarks and communicate findings
Publish research findings and contribute to the broader evaluation research community
Work closely with the infrastructure team to implement your benchmark designs at scale
Stay current with the latest developments in LLM capabilities and evaluation methodologies

Research Scientist

Key skills

About this role

Responsibilities: