Vals AI is seeking exceptional researchers and research engineers to design and build the next generation of AI benchmarks. The role involves creating evaluations that assess the capabilities of foundation models and working closely with major labs and enterprises to define standards for LLM evaluation.
Responsibilities:
- Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities
- Conduct research to ensure our benchmarks are valid, reliable, and meaningful
- Collaborate with foundation model labs and enterprises to understand evaluation needs
- Analyze model performance across benchmarks and communicate findings
- Publish research findings and contribute to the broader evaluation research community
- Work closely with the infrastructure team to implement your benchmark designs at scale
- Stay current with the latest developments in LLM capabilities and evaluation methodologies