Paramount is on a mission to unleash the power of content and is seeking a Principal Machine Learning Engineer to lead their ShortForm & Video Intelligence pod. This role involves the development of models for automated transformation of long-form content into engaging short-form assets, ensuring content safety and narrative comprehension.
Responsibilities:
- Lead the ShortForm Pod: Define the technical roadmap for automated clip generation and video insight, managing a high-performing pod of ML engineers
- Architect Multi-Modal Models: Design and deploy models that synthesize signals across video (pixels), audio (speech/music), and text (transcripts/scripts) to understand content context
- Own Video Intelligence: Develop and scale robust computer vision and audio classifiers for nudity, profanity, and sensitive content detection
- Advance Narrative Understanding: Build models that can identify "hooks," "climax points," and "narrative summaries" to automate the creation of trailers and highlights
- Spoiler & Context Detection: Create intelligence layers that identify plot-critical "spoilers" to ensure ShortForm assets don't ruin the viewing experience for new users
- Scalable Content Pipelines: Partner with Content Engineering to integrate these models into petabyte-scale video processing workflows
Requirements:
- 6-8+ years of experience in machine learning, with a heavy focus on Computer Vision (CV) and Multi-Modal architectures
- Deep Learning Expertise: Mastery of PyTorch or TensorFlow, specifically for video-based tasks (e.g., Video Transformers, 3D CNNs)
- Video Intelligence: Proven experience building safety/moderation models (Nudity, Profanity, Hate Speech) or content-tagging systems
- Leadership: Experience leading senior technical teams and translating complex 'creative' goals into engineering requirements
- Video Tech Stack: Knowledge of video processing frameworks (FFmpeg) and handling various codecs/resolutions at scale
- Experience with Generative AI for Video (automated editing or style transfer)
- Background in NLP/Speech for sync-ing transcripts with video frames
- Knowledge of Narrative Theory or experience in the media/entertainment industry
- Experience deploying large-scale models in Cloud Environments (AWS/GCP/Azure) with high GPU utilization efficiency