The Senior Data Engineer plays a central role in building Causeway’s next-generation data platform on Databricks.
The Senior Data Engineer owns significant slices of the medallion pipeline (bronze to silver to gold) with a focus on architectural decisions that are durable.
Design and own ingestion into the bronze layer from a variety of sources, selecting the appropriate pattern per source, including Auto Loader, API pull, event-driven, database replica/CDC, and CQRS read models.
Build and maintain the silver and gold medallion layers as Delta Live Tables and PySpark notebooks, handling deduplication, entity resolution, canonical ID assignment, and projection into typed object and relationship tables.
Generate and maintain the typed relationship graph that transforms disconnected gold tables into a traversable digital twin.
Own the data quality and testing story end-to-end, implementing automated frameworks that validate completeness, accuracy, consistency, and schema conformance across pipeline stages.
Monitor pipeline health and data quality metrics, proactively identifying and resolving issues before they affect downstream consumers or agent behaviour.
Mentor engineers across the data team, lead architectural discussions, conduct PR reviews, and challenge poor patterns, including those already in production.
Requirements
Strong production experience on Databricks, including Delta Lake, Delta Live Tables, Auto Loader, Unity Catalog, Databricks Asset Bundles, serverless and job compute, and Structured Streaming for near-real-time workloads.
Deep proficiency in Python and PySpark for large-scale data processing and transformation.
Advanced SQL, including recursive CTEs for graph traversal, window functions, query planning, and the ability to interpret EXPLAIN output and optimise index usage.
Strong experience with lakehouse architectures and data modelling for graph workloads — typed entities, edge tables, dimensional vs event-style modelling, and entity resolution across systems that do not share keys.
Hands-on experience with PostgreSQL as a serving layer, including pgvector for semantic search, pg_trgm for fuzzy matching, HNSW vs IVFFlat trade-offs, index tuning, and managing interactive-latency queries under concurrency.
Expertise in data transformation, validation, and contract design.
Experience building data test and observability frameworks.
Working knowledge of cloud storage and identity across AWS, Azure, and GCP — including S3, ADLS Gen2, GCS, cross-cloud copy, IAM roles, and storage credentials.
Genuine curiosity about AI agents and how they consume data, with an understanding of what makes a dataset agent-legible.
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Postgres
PySpark
Python
SQL
Unity
Benefits
Strong advocates of work-life balance, offering hybrid working alongside the opportunity to work from modern, collaborative offices.
Causeway is a carbon neutral company and we offset our calculated carbon footprint.