You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality.
Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input.
Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale.
Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed.
Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time.
Write up findings as research papers, submit them to top venues, and present them at conferences.
Contribute to building AI agents that will perform architecture research and training experiments autonomously, starting from the research foundations we are building now.
Requirements
You have worked on complex AI problems and have something concrete to show for it. A paper, a repository, a thesis, or a side project with evidence of serious technical thinking is what we want to see.
Strong signals include experience adapting or modifying existing model architectures, understanding of how communication structure and layer dependencies affect inference behavior, and fluency in Transformers and MoE with enough depth to reason across trade-offs.
Experience in post-training methods such as fine-tuning, preference optimization, or quantization is a plus, even without production-scale exposure.
Benefits
Direct access to AMD and NVIDIA datacenter GPUs from day one
A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions
Problems that sit on the critical path of model execution speed and that directly influence what the system can become
A remote-friendly working model, though you'll spend at least 50% of your time in our Paris office