Allstate is a company dedicated to protecting families and their belongings from life's uncertainties. They are seeking a Senior Data Engineer to design, build, and operate scalable data pipelines that support enterprise analytics and advanced data use cases. The role involves collaborating with various teams to ensure data quality and performance while developing robust data solutions.
Responsibilities:
- Design, build, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud‑native data technologies
- Develop and optimize ETL/ELT workflows to ingest, transform, and curate data from diverse source systems into analytics‑ready datasets
- Implement data modeling and transformation logic to support reporting, dashboards, and downstream analytical and machine learning workloads
- Build and manage data processing workloads within modern lakehouse platforms, including Microsoft Fabric / OneLake (preferred)
- Ensure data quality, reliability, and consistency by implementing validation checks, monitoring, and reconciliation processes
- Optimize Spark jobs for performance, cost efficiency, and scalability across large and complex datasets
- Manage and evolve data schemas while handling schema drift and upstream source changes
- Develop reusable frameworks, libraries, and standardized patterns to improve data engineering productivity and consistency
- Implement CI/CD pipelines for data workloads to enable automated testing, deployment, and rollback
- Monitor data pipelines and jobs, troubleshoot failures, and resolve performance or data quality issues
- Partner with analytics engineers, BI developers, and data scientists to understand data requirements and deliver curated datasets
- Collaborate with platform, security, and governance teams to ensure data security, compliance, and proper access controls
- Contribute to Agile delivery processes, including sprint planning, design reviews, and continuous improvement initiatives
Requirements:
- Strong experience as a Data Engineer building and operating production data pipelines
- Hands‑on experience with Apache Spark for large‑scale data processing
- Proficiency in Python, SQL, and data transformation best practices
- Experience with cloud‑based data platforms and storage (e.g., Data Lakes, Lakehouse architectures)
- Familiarity with Microsoft Fabric, OneLake, or similar analytics platforms (strong plus)
- Experience designing and optimizing data models for analytical workloads
- Understanding of distributed data processing concepts, performance tuning, and fault tolerance
- Experience with CI/CD, version control, and infrastructure‑as‑code concepts
- Strong problem‑solving skills and ability to troubleshoot complex data issues
- Excellent communication skills and ability to collaborate across technical and non‑technical teams
- 4+ years of experience in data engineering or equivalent role (preferred)
- Experience with real‑time or event‑driven data processing
- Familiarity with data governance, metadata management, and data quality frameworks
- Exposure to orchestration tools and workflow management systems
- Experience supporting analytical, reporting, or machine learning use cases