Everest Technologies, Inc is seeking a Senior Data Engineer to build the memory and knowledge backbone of their Agentic AI ecosystem. The role involves designing data pipelines, optimizing data schemas for AI consumption, and managing vector databases to ensure AI agents have access to accurate and relevant enterprise data.
Responsibilities:
- Design and optimize data schemas specifically for LLM consumption, ensuring that data retrieved via MCP servers is structured to minimize token usage and maximize reasoning accuracy
- Build robust data pipelines using Python (for AI/ML workflows) and C#/.NET (for enterprise integration) to move data from legacy systems into AI-ready formats
- Implement and maintain Vector Databases (e.g., Pinecone, Weaviate, or Milvus) to support Retrieval-Augmented Generation (RAG) alongside live API tool calls
- Work with the Gravitee API Gateway to enforce data masking, PII redaction, and fine-grained access control before data reaches an LLM
- Manage the OpenAPI and MCP metadata that allows AI agents to 'understand' the data they are querying
Requirements:
- 5+ years of work experience
- Design and optimize data schemas specifically for LLM consumption
- Build robust data pipelines using Python (for AI/ML workflows) and C#/.NET (for enterprise integration)
- Implement and maintain Vector Databases (e.g., Pinecone, Weaviate, or Milvus)
- Work with the Gravitee API Gateway to enforce data masking, PII redaction, and fine-grained access control
- Manage the OpenAPI and MCP metadata
- Expert-level Python (Pandas, PySpark, SQLAlchemy)
- Strong familiarity with C# for interacting with .NET-based data layers
- Hands-on experience with Vector Databases and embedding models
- Understanding of how data is exposed through Gravitee APIM and secured via MCP-specific authorization flows
- Experience with SQL/NoSQL databases, dbt, and cloud data warehouses (Snowflake, BigQuery, or Databricks)
- Familiarity with the Model Context Protocol (MCP)
- Experience building Knowledge Graphs to provide relational context to AI agents
- Familiarity with semantic caching to reduce LLM costs and improve response times
- Knowledge of Gravitee Observability for monitoring data drift in agentic conversations