Architect and evolve our multi-agent orchestration platform (currently built on Hermes / Multica)
Design and implement voice AI pipelines — STT, real-time TTS with streaming, VAD, SIP/RTP telephony integration
Build and maintain RAG pipelines with retrieval quality measurement, re-ranking, and hybrid search over vector + keyword indexes
Fine-tune and evaluate LLMs for domain-specific tasks including customer support, classification, and structured extraction
Own the AI observability stack: tracing, instrumentation, cost tracking, and quality regression alerting
Define and enforce guardrails: hallucination detection, PII redaction, output safety scanning, and rate-limiting across multi-tenant deployments
Build data ingestion, preprocessing, and feature pipelines supporting model training and continual learning
Set architectural standards for AI systems across the group; conduct design reviews and own ADRs for major decisions
Mentor ML engineers and applied scientists; grow the team's capabilities in production AI
Engage with external research partners and track emerging work to identify signals worth productionizing
Requirements
8+ years in ML Engineering, Applied AI, or Research Engineering with at least 2 years in a lead or staff-level role
Deep, hands-on experience with LLMs in production: fine-tuning, RLHF/DPO, prompt engineering, RAG, and tool use
Fluent in Python and the core ML stack: PyTorch, Transformers (HuggingFace), PEFT/LoRA
Real experience with LLM inference serving — vLLM, TensorRT-LLM, or TGI — in a latency-sensitive production environment
Practical knowledge of agentic frameworks: multi-agent coordination, tool-call orchestration, context/memory management, and observability (Langfuse, Opik, or equivalent)
Experience with speech AI (ASR/TTS pipelines) or real-time audio systems is a strong plus
Solid understanding of MLOps: experiment tracking (MLflow/W&B), model registries, containerization (Docker/Kubernetes), and CI/CD for ML
Awareness of LLM-specific risk: hallucination, prompt injection, data leakage, fairness, and privacy — and how to mitigate them in production
Strong communication skills: you can write a crisp design doc, run a productive architecture review, and explain tradeoffs to a non-technical stakeholder
Nice to have Experience with voice pipelines end-to-end: VAD → ASR → LLM → TTS → SIP/RTP telephony