Anthropic is dedicated to creating reliable and safe AI systems. The ML/Research Engineer will focus on detecting and mitigating misuse of AI systems, developing classifiers, and ensuring user wellbeing through robust safety measures.
Responsibilities:
- Develop classifiers to detect misuse and anomalous behavior at scale. This includes developing synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on
- Build systems to monitor for harms that span multiple exchanges, such as coordinated cyber attacks and influence operations, and develop new methods for aggregating and analyzing signals across contexts
- Evaluate and improve the safety of agentic products—developing both threat models and environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks
- Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse