About this role

TikTok is the leading destination for short-form mobile video, and they are seeking a Research Scientist to focus on privacy-preserving large-scale model training and architecture optimization. The role involves designing and optimizing training architectures for generative models while ensuring that privacy is prioritized in technology innovation.

Responsibilities:

Design and optimize large-scale training architectures for diffusion-based and unified generative models (e.g., DiT, Rectified Flow, hybrid AR + diffusion systems)
Lead GPU-centric performance optimization, including memory layout, communication overlap, kernel fusion, and throughput scaling across thousands of accelerators
Develop and evolve distributed training strategies (DP / TP / PP / ZeRO / FSDP-style sharding) tailored to long-running, multi-stage foundation model training
Build fault-tolerant, self-healing training systems that can sustain long-running jobs under frequent hardware, network, and software failures
Design mechanisms for fast failure detection, recovery, and minimal training interruption, including checkpointing strategies, restart policies, and controlled rollouts
Improve training ETTR / MFU / utilization efficiency under real-world production constraints
Optimize Diffusion Transformer training pipelines, including noise schedules, timestep strategies, and memory-efficient attention mechanisms
Support unified generation-and-understanding models, enabling shared context, long-sequence multimodal reasoning, and scalable training without architectural bottlenecks
Collaborate with research teams on architecture-level tradeoffs between quality, compute efficiency, and training stability

Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization

Key skills

About this role

Responsibilities: