TechTorch is building the future of intelligent work by helping companies design, build, and deploy AI agents to automate complex workflows. The AI-Enabled Data Engineer will focus on creating scalable data pipelines, managing data quality, and integrating AI capabilities into data engineering processes.
Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL/ELT workflows across cloud and on-prem environments
- Work with Snowflake, Databricks, and Delta Lake as primary data platforms — handling ingestion, transformation, storage optimization, and access patterns
- Model data with dbt: write modular SQL transformations, manage dependencies, enforce data contracts, and maintain documentation
- Build and maintain semantic layers that serve consistent, governed metrics to downstream consumers
- Design data warehouse schemas and data lake structures that balance performance, cost, and queryability
- Implement data quality frameworks — testing, validation, alerting, and lineage — as first-class citizens in every pipeline
- Orchestrate workflows across Airflow, Dagster/Prefect, Azure Data Factory, and Databricks Workflows — choosing the right tool for each job
- Apply DataOps practices: CI/CD for data pipelines, environment promotion, infrastructure as code, and observability
- Own the reliability of data products end-to-end — monitoring, alerting, incident response, and root cause analysis
- Work across AWS and Azure cloud services (S3, Glue, ADLS, ADF, Synapse, Redshift) to design cost-effective, scalable architectures
- Build data pipelines that feed AI systems — including RAG ingestion workflows, vector store loading, document chunking, and embedding pipelines
- Use LLMs as active components in ETL logic: classification, entity extraction, enrichment, and data quality remediation in-flight
- Expose data infrastructure as consumable tools for AI agents via MCP or similar agent-integration patterns
- Use AI-paired programming (Claude Code or equivalent) as a daily productivity layer — not just autocomplete, but genuine workflow acceleration
- Stay current on how AI tooling changes the data engineering workflow and bring those patterns back to the team
Requirements:
- ETL/ELT Design
- Data Modeling
- Data Quality & Testing
- Data Lineage
- Batch & Incremental Loads
- Snowflake
- Databricks
- Apache Spark / PySpark
- Delta Lake
- Data Warehouses
- Data Lakes
- dbt Core / dbt Cloud
- SQL (advanced)
- Semantic Layer
- Dimensional Modeling
- Apache Airflow
- Dagster / Prefect
- Azure Data Factory
- Databricks Workflows
- RAG & Vector Store Pipelines
- AI-Augmented ETL
- MCP / Agent Data Tools
- AI-Paired Programming
- LLM Integration in Pipelines
- AWS (S3, Glue, Redshift)
- Azure (ADLS, ADF, Synapse)
- CI/CD for Data
- Infrastructure as Code
- Python
- Experience with streaming architectures: Kafka, Spark Streaming, or Flink
- Exposure to feature stores (Feast, Tecton) or ML platform data pipelines
- Hands-on with vector databases: Pinecone, Weaviate, Qdrant, or pgvector
- Familiarity with data mesh or data product ownership models
- Experience with Snowpark or Databricks AI/BI tooling
- Building or contributing to internal data tooling, frameworks, or accelerators