Nebius is leading a new era in cloud infrastructure for the global AI economy. They are seeking a Senior Technical Product Manager to join the Serverless AI product team, responsible for owning product areas, making technical trade-offs, and driving customer engagement.
Responsibilities:
- Co-own the Serverless AI product roadmap — Jobs, Endpoints, and DevPods — taking primary ownership of specific product areas while collaborating closely with the other PM on shared priorities and cross-cutting decisions
- Write detailed, technically precise PRDs that engineering teams can execute against. Our PRDs specify CLI syntax, API contracts, state machines, and billing models — not abstract feature descriptions
- Make build/buy/defer decisions on capabilities like autoscaling, multi-node orchestration, HTTPS termination, secret injection, and health checking based on customer signal and strategic priorities
- Understand the full workload lifecycle: container image pull → VM provisioning → GPU attachment → workload execution → cleanup — well enough to identify bottlenecks and propose solutions
- Evaluate technical trade-offs in areas like container cold start optimization (image caching, snapshot restore, warm pools), GPU scheduling and bin-packing, and storage mount performance
- Work directly with engineers on architecture decisions for distributed training support, endpoint autoscaling policies, and fault tolerance mechanisms
- Stay current on the fast-moving serverless GPU infrastructure space — new inference frameworks (vLLM, TensorRT-LLM, SGLang), container runtimes, orchestration approaches — and translate trends into product direction
- Run customer discovery and feedback sessions with ML engineers and platform teams at AI startups and enterprises. Turn qualitative insight into specific product actions
- Analyze usage data, activation funnels, and churn patterns to identify where users get stuck and what features drive retention
- Track market dynamics, emerging technologies, and industry trends to inform product strategy and ensure Nebius stays ahead of where the market is heading
- Define and iterate on pricing, packaging, and tier strategy for Serverless AI
- Own the technical content strategy: quickstart guides, tutorials, reference architectures, and example workloads that reduce time-to-first-job
- Partner with marketing on developer-focused campaigns, webinars, and conference presence
- Work with Solution Architects and Sales to qualify serverless-fit opportunities and support technical evaluations
Requirements:
- You have built, shipped, and iterated on infrastructure or platform products used by developers or ML engineers. Not consumer apps. Not dashboards. Infrastructure
- You understand containers at a practical level — Docker, image registries, container runtimes, resource limits, networking. You've debugged why a container won't start, why GPU isn't visible inside it, or why a mount isn't working
- You have working knowledge of GPU computing for AI/ML: what GPU types exist and when to use them, how training and inference workloads differ in resource requirements, what vLLM / TensorRT-LLM / Triton are and why they matter
- You can read a CLI reference and know if it's well-designed. You've shaped developer-facing APIs, CLIs, or SDKs
- You have run real customer discovery — not surveys, but technical conversations with engineers where you learned something that changed your product direction
- You have 3+ years of product management experience in cloud infrastructure, AI/ML platforms, or developer tools
- Experience at a serverless or GPU cloud company
- Hands-on ML engineering background — you've trained models, deployed inference endpoints, or built ML pipelines yourself
- Experience with Kubernetes for ML workloads (Kubeflow, KServe, Ray Serve) and understanding of why many ML teams want to avoid it
- Prior experience building a product from early stage to scale in a fast-growing market
- Background in systems engineering, distributed systems, or site reliability engineering