BairesDev is a leading technology company that delivers cutting-edge solutions to major clients and innovative startups. They are seeking a Distributed Systems Engineer with expertise in Apache big data internals to contribute to production-grade code and optimize engine internals at a petabyte scale.
Responsibilities:
- Contribute production-grade code to Apache big data projects
- Debug and optimize engine internals — query planning, distributed execution, scheduling, state management, replication, storage layers, and metadata services — at petabyte scale
- Influence architectural direction for performance and scalability at the engine layer
- Profile and tune JVM behavior (GC, memory layout, concurrency)
- Collaborate with cross-functional engineering teams and open source committers on integrations and ecosystem work
- Mentor senior engineers and raise the engineering bar through code reviews and design critiques
Requirements:
- 6+ years of experience in software development
- Strong Java and/or Scala skills
- Experience with distributed systems and concurrent or parallel programming
- Working knowledge of internals of at least one Apache big data project: Spark, Flink, Trino, Ozone, Iceberg, Hive, NiFi, Kafka, Hadoop, HBase, Impala, or Kudu
- Familiarity with JVM performance characteristics (GC, memory, threading)
- Advanced level of English
- Upstream contributions to Apache big data projects; committer or PMC status is a strong plus
- Experience operating distributed systems at petabyte scale in production
- Experience with adjacent or comparable engines (PrestoDB, Impala, Druid, Pinot, ClickHouse, CockroachDB)
- Kubernetes and cloud-native deployment experience
- Public technical presence (talks, blogs, OSS community leadership)