Turing is the world’s leading research accelerator for frontier AI labs, and they are seeking a Remote Senior Python Engineer for LLM Evaluation. The role involves creating datasets for training and benchmarking large language models, evaluating AI-generated code, and collaborating with researchers to enhance AI-driven coding solutions.
Responsibilities:
- Work on AI model training initiatives by curating code examples, building solutions, and correcting code — primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go
- Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks
- Build agents and automated verification tools in Python that can verify the quality of code and identify error patterns
- Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them
- Design verification mechanisms that can automatically verify a solution to a software engineering task