About this role

Noon is on a mission to reinvent how designers work in the AI era by building next-generation AI design tools for product teams. They are hiring an AI Platform Engineer to own the models' production, optimize latency and cost, and ensure the reliability of AI capabilities in their product.

Responsibilities:

Architect and operate the inference platform: serving stack, autoscaling, multi-tenancy, observability
Optimize end-to-end latency (TTFT, TPOT, p95) with quantization, batching, KV-cache management, and speculative decoding
Design multi-GPU parallelism strategies (DP / TP / PP) and own GPU utilization and cost economics
Build a hybrid local + cloud serving architecture — small models on the user’s Mac for fast paths, larger models in the cloud for slow paths
Own canary deployment, rollback automation, and SLO/SLA-driven reliability for all AI features
Build production observability: latency, drift, quality, and cost dashboards
Evaluate and integrate inference engines (vLLM, Triton, TGI, TensorRT, MLX) for cloud and on-device paths
Take fine-tuned models from research artifacts to production traffic

AI Platform Engineer

Key skills

About this role

Responsibilities: