Paramount is on a mission to unleash the power of content, and they are seeking a Senior Tactical ML Engineer to resolve high-impact issues across machine learning systems. This role involves diagnosing and stabilizing systems, implementing solutions, and collaborating across teams to ensure system reliability and continuous development.
Responsibilities:
- Perform rapid diagnosis across model, data, code, infrastructure, and evaluation layers for blocked or unstable efforts
- Identify root causes and define corrective actions required to restore progress
- Communicate findings and resolution plans clearly across research, engineering, and operational teams
- Contribute directly to blocked ML initiatives by implementing fixes across model behavior, data pipelines, and system architecture
- Develop and validate solutions, including debugging, targeted refactoring, and experimental validation
- Build enabling components or modifications required to unblock downstream development
- Ensure that resolved systems are stable, validated, and ready for continued development
- Provide clear handoff artifacts, including working code, documentation, and recommended next steps
- Establish preventative measures to reduce recurrence of identified issues
- Work across research, infrastructure, platform, evaluation, and integration teams to align on root causes and resolution plans
- Ensure that fixes are compatible with downstream systems and integration requirements
- Validate that changes do not introduce regressions through appropriate testing and benchmarking
- Resolution efficiency: High-impact issues are resolved with clear root-cause identification and durable fixes
- Execution pace: Time from escalation to restored progress is minimized without compromising quality
- System stability: Resolved systems maintain reliability and do not regress under continued use
- Knowledge transfer: Owning teams receive sufficient context, documentation, and artifacts to continue development autonomously
Requirements:
- Senior-level experience spanning software engineering, machine learning systems, and infrastructure in production or production-adjacent environments
- Solid debugging capability across multiple system layers, including application code, data pipelines, distributed training, and deployment systems
- Experience diagnosing and resolving complex issues in ML systems under time constraints
- Well-developed operational judgment, including the ability to triage, prioritize, and execute with incomplete information
- Effective communication skills and ability to collaborate across multiple technical disciplines