Pyramid Consulting, Inc. is seeking a talented Data Engineer for a long-term contract opportunity. The role involves designing and optimizing data pipelines, processing unstructured data, and collaborating with AI/ML engineers to prepare datasets for advanced analytics applications.
Responsibilities:
- Design, build, and optimize medallion architecture pipelines (Bronze → Silver → Gold) in Databricks using Delta Lake
- Ingest and process unstructured data (PDFs, images, documents, logs) from enterprise source systems (SAP, Salesforce, TrackWise, Azure DevOps, etc.)
- Curate and model data into structured, query-ready schemas to power AI agents and analytics applications
- Develop and maintain data quality frameworks, validation checks, and monitoring across pipeline stages
- Collaborate with AI/ML engineers to prepare datasets for LLM-powered agents (Claude, GPT) including embeddings, chunking strategies, and retrieval-augmented generation (RAG) pipelines
- Support Unity Catalog governance, access controls, and schema management across dev/prod workspaces
- Partner with cross-functional teams (Quality, Regulatory, IT, R&D) to translate business requirements into scalable data solutions
Requirements:
- 7+ years of hands-on experience with Databricks (Delta Lake, Spark, SQL, Python)
- Proven experience implementing medallion architecture (Bronze → Silver → Gold) at scale
- Strong expertise working with unstructured data — parsing, transforming, and curating documents, PDFs, and text into structured data models
- Experience building data pipelines that feed AI/ML agents or LLM-based applications
- Working knowledge of Claude (Anthropic) and GPT (OpenAI) model integration, prompt engineering, or agent orchestration
- Proficiency in Python and SQL; experience with Databricks notebooks, workflows, and jobs
- Solid understanding of data modeling best practices for analytics and AI consumption