
Hybrid
2-3 days per week onsite at the client s Irvine CA office
1 day per week onsite at the client s Downtown Los Angeles office
1 day remote
We are seeking a highly skilled Senior AI Engineer to lead the design, development, and operationalization of a production-grade Generative AI and Data Platform on AWS. This role will be responsible for building scalable AI solutions that leverage Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), vector search, knowledge graphs, and governed data pipelines.
The ideal candidate will have deep expertise across the complete AI lifecycle, including data ingestion, knowledge engineering, embeddings generation, retrieval systems, backend API development, MLOps, and production deployment. This individual will work closely with product, engineering, and platform teams to enable AI-powered capabilities in customer-facing applications while helping evolve the organization toward agentic AI architectures.
Design, build, and operationalize LLM-powered applications using:
Retrieval-Augmented Generation (RAG)
Embedding pipelines
Prompt orchestration frameworks
Evaluation and experimentation frameworks
Develop and optimize vector search solutions using Amazon OpenSearch.
Design and implement graph-based knowledge systems using Amazon Neptune to support:
Relationship modeling
Knowledge lineage
Explainability
Knowledge discovery
Integrate supporting AWS services including:
Amazon ElastiCache (Redis) for caching and session management
Amazon DynamoDB for low-latency, scalable data access
Build agentic AI workflows using frameworks such as:
LangGraph
AutoGen
CrewAI
Equivalent agent orchestration frameworks
Implement LLM application frameworks including:
LangChain
LlamaIndex
Establish standards for:
Tool integration
Context management
Shared memory patterns
MCP-style architectures and context-sharing mechanisms
Evaluate and optimize:
Model performance
Retrieval effectiveness
Latency
Cost efficiency
Context window utilization
Design and develop scalable data pipelines using Databricks and Apache Spark.
Build and maintain:
Data ingestion pipelines
Data transformation workflows
Document processing pipelines
Metadata enrichment processes
Embedding generation and indexing workflows
Implement document preparation techniques including:
Chunking strategies
Metadata tagging
Semantic enrichment
Ensure high standards of data quality through:
Validation frameworks
Completeness checks
Consistency monitoring
Data observability
Implement data governance controls including:
Data classification
Access management
Retention policies
Auditability
Lineage tracking
Design and develop scalable backend services exposing AI platform capabilities.
Build secure, reusable APIs and microservices for enterprise applications.
Establish best practices for:
API design
Versioning
Reliability
Retry mechanisms
Circuit breakers
Idempotent operations
Enable platform reusability across multiple teams and business applications.
Design and maintain CI/CD pipelines for AI, ML, and data workloads.
Deploy and manage production systems using:
Docker
Kubernetes
Implement deployment strategies including:
Blue-Green Deployments
Canary Releases
Rollback Mechanisms
Feature Flagging
Ensure platform reliability through:
Monitoring
Logging
Alerting
Observability
Cost tracking
Data freshness monitoring
Implement:
Secrets management
Role-based access controls
Least-privilege security practices
Continuously optimize platform performance, scalability, and cost.
Define and measure AI quality metrics including:
Grounding/Faithfulness
Retrieval relevance
Response consistency
Hallucination rates
Latency
Cost per request
Build and maintain:
Prompt versioning frameworks
Offline evaluation pipelines
Automated testing processes
Continuous improvement workflows
Drive AI quality improvements through experimentation and monitoring.
Implement secure AI solutions with:
Authentication
Authorization
Access controls
Data protection mechanisms
Establish responsible AI guardrails.
Ensure compliance with organizational and industry standards related to:
AI safety
Privacy
Governance
Monitoring
Auditability
Bachelor s or Master s degree in:
Computer Science
Data Science
Artificial Intelligence
Machine Learning
Related technical discipline
Strong hands-on experience building production-grade Generative AI solutions.
Expertise in:
Retrieval-Augmented Generation (RAG)
Embeddings
Prompt engineering
Retrieval optimization
Hands-on expertise with:
Amazon OpenSearch (Vector Search)
Amazon Neptune
Amazon DynamoDB
Amazon ElastiCache (Redis)
Experience with:
LangChain
LlamaIndex
Hands-on experience with:
LangGraph
AutoGen
CrewAI
Similar agent orchestration frameworks
Strong experience with:
Databricks
Apache Spark
Large-scale data pipelines
Embedding pipelines
Strong Python development experience.
Experience building scalable APIs and microservices.
Strong understanding of distributed systems and service-oriented architectures.
Experience with:
CI/CD pipelines
Docker
Kubernetes
Production AI deployments
Experience with AI evaluation and observability platforms.
Experience implementing AI governance and compliance frameworks.
Advanced Kubernetes and MLOps experience.
Familiarity with:
Model Context Protocol (MCP)
Agent-based architectures
Multi-agent systems
Knowledge graph ecosystems
Preferred experience in one or more of the following:
AI/ML Platform Engineering
Generative AI Applications
Enterprise AI Platforms
Data Platforms & Big Data Engineering
Knowledge Management Systems
One or more AWS certifications:
AWS Certified Solutions Architect
AWS Certified Machine Learning - Specialty
AWS Certified Data Engineer
Strong analytical and problem-solving abilities.
Excellent communication and stakeholder management skills.
Ability to explain complex AI concepts to technical and non-technical audiences.
Collaborative and cross-functional mindset.
Strong ownership mentality with proactive execution.
Ability to thrive in fast-paced, evolving environments.
Candidates must demonstrate hands-on production experience in:
Generative AI / LLMs (RAG, Embeddings, Prompt Engineering)
AWS Cloud Services (OpenSearch, Neptune, DynamoDB, Redis/ElastiCache)
Vector Search & Retrieval Systems
Knowledge Graphs / Graph Databases (Amazon Neptune)
LangChain and/or LlamaIndex
Agentic AI Frameworks (LangGraph, AutoGen, CrewAI)
Databricks & Apache Spark
Python Backend Development & API Engineering
Production Deployment using Docker and Kubernetes
AI Platform Architecture and End-to-End Delivery