Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. They are seeking a Senior Data Engineer to lead the design and implementation of their next-generation Data Lakehouse, focusing on scalable data storage and analytics. The role involves managing data ingestion pipelines and optimizing performance for large-scale data processing.

Responsibilities:

Design and implement a scalable Lakehouse environment. Own the lifecycle of efficient storage, including schema evolution, partition transformation, and snapshot management
Build and manage real-time and batch data ingestion pipelines using Kafka and Spark
Deploy, manage, and scale data workloads (such as Spark executors) using Kubernetes (EKS)
Develop complex ETL workflows. Optimize jobs for large-scale data processing focusing on memory management and shuffle optimization
Manage and monitor end-to-end data lifecycles using orchestration tools
Write and tune high-performance queries against the Lakehouse. Implement optimization strategies, compaction, and data skipping to reduce latency and cloud costs

Requirements:

5+ years in Data Engineering, with at least 2 years focused on Lakehouse or modern Data Lake environments
Hands-on experience with AWS ecosystems, async/queue processing in a production environment and working with a Lakehouse
Expert in workflow management, data transformation, storage and exchange at scale
Proven experience with events driven job processing. Mastery in data ingestion, processing large-scale data and performance tuning
Experience with Kubernetes and infrastructure as code
Experience with query engines like Starrocks or Druid
Familiarity with building and monitoring dashboards on K8s-based data clusters

Senior Data Engineer

Key skills

About this role

Responsibilities:

Requirements: