Design, build, and maintain scalable, reliable data pipelines across GCP and AWS infrastructure, with BigQuery as our warehouse
Own and evolve our Dagster orchestration layer, ensuring pipelines are observable, testable, and operationally robust
Architect and implement ingestion patterns for diverse source systems, from SaaS APIs to acquired firm data with unstructured schemas
Define and enforce data quality standards at the ingestion layer: completeness, freshness, lineage, security, privacy and schema contracts
Build the technical playbook for onboarding acquired firms’ data into Lawhive’s canonical data model
Design repeatable ELT patterns that handle conflicting schemas, messy legacy systems, and varying data quality, making firm onboarding a weeks-not-months process
Partner with Analytics Engineering on the canonical Lawhive data model, ensuring upstream pipelines deliver clean, well-structured data
Enabling access controls and privacy-preserving access to firm tenanted data
Apply LLMs and AI tooling (Claude Code, Cursor) to data engineering tasks: entity resolution, schema mapping, automated data quality checks, and pipeline generation
Partner with our AI/ML teams to build reliable data pipelines that feed model training and inference workflows
Building scalable storage and processing solutions for various data and AI projects and products
Proactively monitor and optimise BigQuery usage for query performance and cost efficiency as data volumes grow
Evaluate and recommend tooling changes to keep the stack modern, efficient, and fit for AI-native workflows
Work closely with the Analytics Engineer and Data Analysts to ensure the platform supports self-serve analytics and the dbt semantic layer
Partner with Product and Engineering to instrument new product features and surface clean event data
Contribute to documentation and runbooks that make the platform accessible and understandable across the team.
Requirements
5+ years of data engineering experience, including hands-on ownership of production pipelines at a SaaS or tech scaleup
Deep expertise in cloud data warehouses, ideally BigQuery, including performance tuning, partitioning, clustering, and cost management
Comfortable with Python for pipeline development and have experience with orchestration tools (Dagster, Airflow, or similar)
Built data integration patterns for complex or heterogeneous source systems. Bonus if in an M&A or multi-entity context
Strong opinions on data modelling, pipeline design, and the modern data stack; able to defend trade-offs and push back on bad patterns
AI-native in how you work. Use Cursor, Claude Code, or equivalent tools daily and think LLMs structurally change how data engineering gets done
Collaborate effectively with Analytics Engineers and Analysts, understanding where the pipeline ends and modelling begins
Commercially literate enough to translate business context into infrastructure decisions.
Tech Stack
Airflow
AWS
BigQuery
Cloud
Google Cloud Platform
Python
Benefits
💰 Meaningful early-stage equity at one of Europe’s fastest growing startups
✈️ 33 days’ annual leave (25 + bank holidays) plus your birthday off