Quantiphi is an award-winning, AI-First global digital engineering company that helps leading Fortune 1000 organizations transform bold ideas into measurable business impact. The Senior Machine Learning Engineer role involves designing, developing, deploying, and maintaining advanced AI solutions, specializing in Generative AI and Agentic AI, particularly for the telecom sector.
Responsibilities:
- Implement and manage comprehensive monitoring solutions for AI models and agent systems in production, tracking key performance indicators (KPIs), latency, throughput, and resource utilization
- Develop and deploy robust drift detection mechanisms (data drift, concept drift, model drift) and establish proactive alerting systems to ensure model integrity and performance
- Integrate AI systems with APM tools to gain deep insights into application behavior, identify bottlenecks, and optimize overall system performance
- Implement distributed tracing across complex AI workflows and microservices to provide end-to-end visibility and facilitate efficient debugging and performance optimization
- Collaborate with software engineers to integrate AI components seamlessly into existing and new full-stack applications, ensuring scalability, reliability, and maintainability
- Deploy Generative and Agentic AI models and systems at scale, optimizing for latency, throughput, robustness, and cost-efficiency on cloud platforms (GCP preferred)
- Implement and manage CI/CD pipelines for AI solutions, ensuring seamless integration, testing, and deployment
- Develop strategies for continuous improvement, model retraining, and A/B testing in production environments
- Design, develop, and optimize enterprise-grade Generative AI solutions, leveraging LLMs for various applications within the telecom domain
- Implement sophisticated prompt engineering strategies (e.g., Chain of Thought, Few-Shot, RAG) to maximize LLM reasoning and creativity
- Architect, develop, and deploy robust and scalable Agentic AI systems capable of autonomous decision-making, task execution, and complex problem-solving
- Utilize and contribute to agentic frameworks such as Google ADK (Agent Development Kit) and LangChain to build sophisticated multi-agent systems and orchestrate complex workflows
- Design and implement mechanisms for agent memory (short-term, long-term, episodic) and context management
- Empower agents with advanced tool-use capabilities, integrating with external APIs, databases, and proprietary telecom systems
- Actively research and integrate the latest advancements in Generative AI, LLMs, and Agentic AI, including self-correction, emergent behavior, cognitive architectures
- Collaborate closely with data scientists, product managers, and other engineering teams to define requirements, design solutions, and deliver high-impact AI products
- Translate complex technical concepts and business value to diverse stakeholders, including non-technical audiences
- Contribute to technical documentation, best practices, and knowledge sharing within the team
Requirements:
- 4-6 years of hands-on experience in Machine Learning and Artificial Intelligence
- Proven track record in designing, developing, and deploying enterprise-grade Generative AI solutions
- Strong experience in building and deploying complex agentic workflows and multi-agent systems
- Demonstrated experience with MLOps principles and practices, including model monitoring, drift detection, and CI/CD for AI systems
- Expert proficiency in Python and relevant ML/DL frameworks
- Deep practical experience with Generative AI models, including various LLM architectures (e.g., Transformers, GPT, Llama)
- Mandatory expertise with Agentic AI frameworks such as LangChain, LangGraph and Google Agent Development Kit (ADK)
- Experience with other agentic workflow tools like CrewAI, AutoGen, or similar
- Strong understanding and hands-on experience with Performance Monitoring and End-to-End Tracing
- Strong understanding and hands-on experience with MLOps / Production ML Systems, Monitoring
- Strong understanding and hands-on experience with Drift Detection Techniques, Alerting & Observability
- Strong understanding and hands-on experience with Application Performance Management (APM)
- Strong understanding and hands-on experience with Systems Tracing
- Strong understanding and hands-on experience with Full-Stack Application Architecture
- Proficiency with version control systems (Git) and software development best practices (SDLC)
- Solid background in Natural Language Processing (NLP) tasks (e.g., text generation, summarization, question answering, semantic search)
- Ability to implement self-correction, self-improvement, and adaptive learning mechanisms in agents
- Experience designing and managing agent memory systems (e.g., vector databases, knowledge graphs)
- Proficiency in enabling tool use and API integration for agents
- Understanding of multi-agent collaboration, coordination, and communication protocols
- Exceptional problem-solving and analytical abilities, with a strong focus on delivering practical solutions
- Excellent communication and presentation skills, capable of articulating complex technical concepts clearly
- Ability to work independently, take initiative, and lead technical aspects of projects
- Strong collaborative spirit, working effectively with cross-functional teams
- Curiosity and a passion for staying abreast of the latest advancements in AI research and technology
- Exposure to Google Cloud Platform (GCP) for model deployment, scaling, and infrastructure management
- Prior Telco industry experience or familiarity with telecom data and use cases
- Deployment of multi-agent systems at Telco production scale
- Expertise in large-scale data processing and model optimization
- Publications or open-source contributions in Agentic/Generative AI