Anthropic is dedicated to creating reliable and safe AI systems. The ML/Research Engineer will focus on detecting and mitigating misuse of AI systems, developing classifiers, and ensuring user wellbeing through robust safety measures.

Responsibilities:

Develop classifiers to detect misuse and anomalous behavior at scale. This includes developing synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on
Build systems to monitor for harms that span multiple exchanges, such as coordinated cyber attacks and influence operations, and develop new methods for aggregating and analyzing signals across contexts
Evaluate and improve the safety of agentic products—developing both threat models and environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks
Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse

ML/Research Engineer, Safeguards

Key skills

About this role

Responsibilities: