Cloudera is a leading company in data management and analytics, empowering organizations to transform complex data into actionable insights. They are seeking a Staff Software Engineer with expertise in distributed systems to join the Apache Spark Team, focusing on building next-generation features for their Data Engineering Experience and contributing to the open-source community. The role involves architecting scalable solutions, enhancing engineering velocity, and collaborating with a high-impact team to tackle large-scale challenges in data processing.
Responsibilities:
- Pioneer Scalable Solutions: Architect, implement, and deliver next-generation features for Cloudera’s Data Engineering Experience, operating at a massive scale on thousands of production nodes
- Drive Open-Source Innovation: Be a core contributor to Apache Spark, directly shaping the future of distributed data processing in the open-source community
- Build with Modern Stacks: Develop high-performance features using Scala, Java, and Python on modern data platforms
- Deepen Technical Mastery: Gain and apply expert-level knowledge in core distributed data processing concepts, including:
- SQL Planners and Optimizers
- Data layout and modern table formats like Apache Parquet and Iceberg
- Fault tolerance and resilience in large-scale distributed systems
- Own the Technology Stack: Develop a deep technical understanding of components across the Cloudera Data Engineering Experience, with a focus on Iceberg and Spark, applying this knowledge to your daily tasks
- Conquer Large-Scale Challenges: Work hands-on with massive distributed systems, scaling from hundreds to thousands of nodes in live production clusters
- Ensure System Integrity: Conduct thorough root cause analysis, debug complex system-level deployment issues, and resolve failures to maintain high system quality
- Enhance Engineering Velocity: Improve internal infrastructure and tooling to streamline development, testing, and deployment processes
- Collaborate and Influence: Work closely with a high-impact, distributed team and stakeholders to drive product vision and delivery
Requirements:
- 5-7+ years of experience in professional software development
- Proven experience leading technical initiatives and delivering complex product enhancements from concept to production
- Strong proficiency in Java, Scala, or other JVM-based language
- Solid experience in the design and development of distributed systems
- Passion for clean coding, attention to detail, and a focus on software quality and maintainability
- Strong oral and written communication skills for effective collaboration across a distributed team
- Demonstrated ability to research, problem-solve, and operate independently without constant supervision
- An open-minded approach with a desire to learn new technologies and an unwavering passion for building exceptional products
- Spark & Ecosystem Experience with using/developing Apache Spark, Apache Iceberg, or other related technologies
- Deep experience with large-scale, distributed systems design and development, including a strong understanding of scaling, performance optimization, and scheduling
- Experience with SQL Planners and Optimizers
- Prior experience as a contributor to open-source projects