About this role

Turing is the world’s leading research accelerator for frontier AI labs, and they are seeking a Remote Senior Python Engineer for LLM Evaluation. The role involves creating datasets for training and benchmarking large language models, evaluating AI-generated code, and collaborating with researchers to enhance AI-driven coding solutions.

Responsibilities:

Work on AI model training initiatives by curating code examples, building solutions, and correcting code — primarily in Python, with additional work in JavaScript (including ReactJS), C/C++, Java, Rust, and Go
Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable
Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks
Build agents and automated verification tools in Python that can verify the quality of code and identify error patterns
Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them
Design verification mechanisms that can automatically verify a solution to a software engineering task

Remote Senior Python Engineer – LLM Evaluation (US-based)

Key skills

About this role

Responsibilities: