Pyramid Consulting, Inc. is seeking a talented Data Engineer for a long-term contract opportunity. The role involves designing and optimizing data pipelines, processing unstructured data, and collaborating with AI/ML engineers to prepare datasets for advanced analytics applications.

Responsibilities:

Design, build, and optimize medallion architecture pipelines (Bronze → Silver → Gold) in Databricks using Delta Lake
Ingest and process unstructured data (PDFs, images, documents, logs) from enterprise source systems (SAP, Salesforce, TrackWise, Azure DevOps, etc.)
Curate and model data into structured, query-ready schemas to power AI agents and analytics applications
Develop and maintain data quality frameworks, validation checks, and monitoring across pipeline stages
Collaborate with AI/ML engineers to prepare datasets for LLM-powered agents (Claude, GPT) including embeddings, chunking strategies, and retrieval-augmented generation (RAG) pipelines
Support Unity Catalog governance, access controls, and schema management across dev/prod workspaces
Partner with cross-functional teams (Quality, Regulatory, IT, R&D) to translate business requirements into scalable data solutions

Requirements:

7+ years of hands-on experience with Databricks (Delta Lake, Spark, SQL, Python)
Proven experience implementing medallion architecture (Bronze → Silver → Gold) at scale
Strong expertise working with unstructured data — parsing, transforming, and curating documents, PDFs, and text into structured data models
Experience building data pipelines that feed AI/ML agents or LLM-based applications
Working knowledge of Claude (Anthropic) and GPT (OpenAI) model integration, prompt engineering, or agent orchestration
Proficiency in Python and SQL; experience with Databricks notebooks, workflows, and jobs
Solid understanding of data modeling best practices for analytics and AI consumption

Data Engineer

Key skills

About this role

Responsibilities:

Requirements: