Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. They are seeking a Senior Data Engineer to lead the design and implementation of their next-generation Data Lakehouse, focusing on scalable data storage and analytics. The role involves managing data ingestion pipelines and optimizing performance for large-scale data processing.
Responsibilities:
- Design and implement a scalable Lakehouse environment. Own the lifecycle of efficient storage, including schema evolution, partition transformation, and snapshot management
- Build and manage real-time and batch data ingestion pipelines using Kafka and Spark
- Deploy, manage, and scale data workloads (such as Spark executors) using Kubernetes (EKS)
- Develop complex ETL workflows. Optimize jobs for large-scale data processing focusing on memory management and shuffle optimization
- Manage and monitor end-to-end data lifecycles using orchestration tools
- Write and tune high-performance queries against the Lakehouse. Implement optimization strategies, compaction, and data skipping to reduce latency and cloud costs
Requirements:
- 5+ years in Data Engineering, with at least 2 years focused on Lakehouse or modern Data Lake environments
- Hands-on experience with AWS ecosystems, async/queue processing in a production environment and working with a Lakehouse
- Expert in workflow management, data transformation, storage and exchange at scale
- Proven experience with events driven job processing. Mastery in data ingestion, processing large-scale data and performance tuning
- Experience with Kubernetes and infrastructure as code
- Experience with query engines like Starrocks or Druid
- Familiarity with building and monitoring dashboards on K8s-based data clusters