Apollo Research is focused on AI control research and is seeking an Applied Control Researcher to join their AGI safety product team. The role involves designing and implementing control protocols, conducting experiments to test monitoring effectiveness, and developing monitoring systems to enhance AI safety.

Responsibilities:

Systematically collect and catalog coding agent failure modes from real-world instances, our internal deployments, public examples, research literature, and theoretical predictions
Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors
Build and maintain evaluation frameworks to measure progress on monitoring capabilities
Build and maintain high-quality datasets to train and test monitors on
Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency
Stay current with research on AI safety, agent failures, and detection methodologies
Stay current with research into coding security and safety vulnerabilities
Develop & maintain a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors)
Experiment with different reasoning strategies and output formats to improve monitor reliability
Design and test hierarchical monitoring architectures and ensemble approaches
Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs
Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification
Fine-tune open-source models to create efficient monitors for high-volume production environments
Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes
Build automated red-teaming pipelines that attack monitors at scale
Design iterative adversarial games where a red-team and blue team continuously attack and defend respectively

Applied Control Researcher

Key skills

About this role

Responsibilities: