Quantiphi is an award-winning, AI-First global digital engineering company that helps leading Fortune 1000 organizations transform bold ideas into measurable business impact. The Senior Machine Learning Engineer role involves designing, developing, deploying, and maintaining advanced AI solutions, specializing in Generative AI and Agentic AI, particularly for the telecom sector.

Responsibilities:

Implement and manage comprehensive monitoring solutions for AI models and agent systems in production, tracking key performance indicators (KPIs), latency, throughput, and resource utilization
Develop and deploy robust drift detection mechanisms (data drift, concept drift, model drift) and establish proactive alerting systems to ensure model integrity and performance
Integrate AI systems with APM tools to gain deep insights into application behavior, identify bottlenecks, and optimize overall system performance
Implement distributed tracing across complex AI workflows and microservices to provide end-to-end visibility and facilitate efficient debugging and performance optimization
Collaborate with software engineers to integrate AI components seamlessly into existing and new full-stack applications, ensuring scalability, reliability, and maintainability
Deploy Generative and Agentic AI models and systems at scale, optimizing for latency, throughput, robustness, and cost-efficiency on cloud platforms (GCP preferred)
Implement and manage CI/CD pipelines for AI solutions, ensuring seamless integration, testing, and deployment
Develop strategies for continuous improvement, model retraining, and A/B testing in production environments
Design, develop, and optimize enterprise-grade Generative AI solutions, leveraging LLMs for various applications within the telecom domain
Implement sophisticated prompt engineering strategies (e.g., Chain of Thought, Few-Shot, RAG) to maximize LLM reasoning and creativity
Architect, develop, and deploy robust and scalable Agentic AI systems capable of autonomous decision-making, task execution, and complex problem-solving
Utilize and contribute to agentic frameworks such as Google ADK (Agent Development Kit) and LangChain to build sophisticated multi-agent systems and orchestrate complex workflows
Design and implement mechanisms for agent memory (short-term, long-term, episodic) and context management
Empower agents with advanced tool-use capabilities, integrating with external APIs, databases, and proprietary telecom systems
Actively research and integrate the latest advancements in Generative AI, LLMs, and Agentic AI, including self-correction, emergent behavior, cognitive architectures
Collaborate closely with data scientists, product managers, and other engineering teams to define requirements, design solutions, and deliver high-impact AI products
Translate complex technical concepts and business value to diverse stakeholders, including non-technical audiences
Contribute to technical documentation, best practices, and knowledge sharing within the team

Requirements:

4-6 years of hands-on experience in Machine Learning and Artificial Intelligence
Proven track record in designing, developing, and deploying enterprise-grade Generative AI solutions
Strong experience in building and deploying complex agentic workflows and multi-agent systems
Demonstrated experience with MLOps principles and practices, including model monitoring, drift detection, and CI/CD for AI systems
Expert proficiency in Python and relevant ML/DL frameworks
Deep practical experience with Generative AI models, including various LLM architectures (e.g., Transformers, GPT, Llama)
Mandatory expertise with Agentic AI frameworks such as LangChain, LangGraph and Google Agent Development Kit (ADK)
Experience with other agentic workflow tools like CrewAI, AutoGen, or similar
Strong understanding and hands-on experience with Performance Monitoring and End-to-End Tracing
Strong understanding and hands-on experience with MLOps / Production ML Systems, Monitoring
Strong understanding and hands-on experience with Drift Detection Techniques, Alerting & Observability
Strong understanding and hands-on experience with Application Performance Management (APM)
Strong understanding and hands-on experience with Systems Tracing
Strong understanding and hands-on experience with Full-Stack Application Architecture
Proficiency with version control systems (Git) and software development best practices (SDLC)
Solid background in Natural Language Processing (NLP) tasks (e.g., text generation, summarization, question answering, semantic search)
Ability to implement self-correction, self-improvement, and adaptive learning mechanisms in agents
Experience designing and managing agent memory systems (e.g., vector databases, knowledge graphs)
Proficiency in enabling tool use and API integration for agents
Understanding of multi-agent collaboration, coordination, and communication protocols
Exceptional problem-solving and analytical abilities, with a strong focus on delivering practical solutions
Excellent communication and presentation skills, capable of articulating complex technical concepts clearly
Ability to work independently, take initiative, and lead technical aspects of projects
Strong collaborative spirit, working effectively with cross-functional teams
Curiosity and a passion for staying abreast of the latest advancements in AI research and technology
Exposure to Google Cloud Platform (GCP) for model deployment, scaling, and infrastructure management
Prior Telco industry experience or familiarity with telecom data and use cases
Deployment of multi-agent systems at Telco production scale
Expertise in large-scale data processing and model optimization
Publications or open-source contributions in Agentic/Generative AI

Senior Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: