SPREEAI is building the future of AI-powered commerce through photorealistic virtual try-on and multimodal intelligence. They are looking for a Principal Engineer to build the infrastructure, deployment pipelines, and observability systems that enable multimodal AI models to move from research prototypes to reliable, production-grade deployments.
Responsibilities:
- Build and operate SPREEAI’s end-to-end ML platform spanning training, evaluation, deployment, and monitoring
- Enable scalable and reliable training workflows through orchestration, infrastructure, and resource management systems
- Define platform standards for model packaging, model registry, dataset lineage, experiment tracking, checkpointing, and deployment automation
- Enable reliable and scalable inference deployments through standardized serving, orchestration, and monitoring frameworks
- Build and operate model deployment pipelines with versioning, reproducibility, rollback, approval gates, evaluation gates, and production observability
- Establish production SLOs for latency, availability, error rate, GPU saturation, cold-start time, cost per inference, and model quality drift
- Standardize and support serving infrastructure using modern inference runtimes such as vLLM, NVIDIA Triton, TensorRT-LLM, Ray Serve, TorchServe, ONNX Runtime, or equivalent systems
- Design and manage GPU allocation, scheduling, and resource utilization across training and inference workloads
- Improve GPU utilization, throughput, latency, reliability, and cost efficiency across model lifecycle systems
- Design and operate model evaluation and benchmarking systems, including automated regression detection and quality gates for production releases
- Partner with research teams to productionize new capabilities by providing robust infrastructure, tooling, and deployment pathways