TechTorch is building the future of intelligent work by helping companies design, build, and deploy AI agents to automate complex workflows. The AI-Enabled Data Engineer will focus on creating scalable data pipelines, managing data quality, and integrating AI capabilities into data engineering processes.

Responsibilities:

Design, build, and maintain scalable data pipelines and ETL/ELT workflows across cloud and on-prem environments
Work with Snowflake, Databricks, and Delta Lake as primary data platforms — handling ingestion, transformation, storage optimization, and access patterns
Model data with dbt: write modular SQL transformations, manage dependencies, enforce data contracts, and maintain documentation
Build and maintain semantic layers that serve consistent, governed metrics to downstream consumers
Design data warehouse schemas and data lake structures that balance performance, cost, and queryability
Implement data quality frameworks — testing, validation, alerting, and lineage — as first-class citizens in every pipeline
Orchestrate workflows across Airflow, Dagster/Prefect, Azure Data Factory, and Databricks Workflows — choosing the right tool for each job
Apply DataOps practices: CI/CD for data pipelines, environment promotion, infrastructure as code, and observability
Own the reliability of data products end-to-end — monitoring, alerting, incident response, and root cause analysis
Work across AWS and Azure cloud services (S3, Glue, ADLS, ADF, Synapse, Redshift) to design cost-effective, scalable architectures
Build data pipelines that feed AI systems — including RAG ingestion workflows, vector store loading, document chunking, and embedding pipelines
Use LLMs as active components in ETL logic: classification, entity extraction, enrichment, and data quality remediation in-flight
Expose data infrastructure as consumable tools for AI agents via MCP or similar agent-integration patterns
Use AI-paired programming (Claude Code or equivalent) as a daily productivity layer — not just autocomplete, but genuine workflow acceleration
Stay current on how AI tooling changes the data engineering workflow and bring those patterns back to the team

Requirements:

ETL/ELT Design
Data Modeling
Data Quality & Testing
Data Lineage
Batch & Incremental Loads
Snowflake
Databricks
Apache Spark / PySpark
Delta Lake
Data Warehouses
Data Lakes
dbt Core / dbt Cloud
SQL (advanced)
Semantic Layer
Dimensional Modeling
Apache Airflow
Dagster / Prefect
Azure Data Factory
Databricks Workflows
RAG & Vector Store Pipelines
AI-Augmented ETL
MCP / Agent Data Tools
AI-Paired Programming
LLM Integration in Pipelines
AWS (S3, Glue, Redshift)
Azure (ADLS, ADF, Synapse)
CI/CD for Data
Infrastructure as Code
Python
Experience with streaming architectures: Kafka, Spark Streaming, or Flink
Exposure to feature stores (Feast, Tecton) or ML platform data pipelines
Hands-on with vector databases: Pinecone, Weaviate, Qdrant, or pgvector
Familiarity with data mesh or data product ownership models
Experience with Snowpark or Databricks AI/BI tooling
Building or contributing to internal data tooling, frameworks, or accelerators

AI-Enabled Data Engineer

Key skills

About this role

Responsibilities:

Requirements: