Vals AI is seeking exceptional researchers and research engineers to design and build the next generation of AI benchmarks. The role involves leading the design and development of novel benchmarks that assess real-world capabilities of LLMs and influencing how AI systems are evaluated.
Responsibilities:
- Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities
- Conduct research to ensure our benchmarks are valid, reliable, and meaningful
- Collaborate with foundation model labs and enterprises to understand evaluation needs
- Analyze model performance across benchmarks and communicate findings
- Publish research findings and contribute to the broader evaluation research community
- Work closely with the infrastructure team to implement your benchmark designs at scale
- Stay current with the latest developments in LLM capabilities and evaluation methodologies