BairesDev is a leading technology company that delivers cutting-edge solutions to major clients and innovative startups. They are seeking a Distributed Systems Engineer with expertise in Apache big data internals to contribute to production-grade code and optimize engine internals at a petabyte scale.

Responsibilities:

Contribute production-grade code to Apache big data projects
Debug and optimize engine internals — query planning, distributed execution, scheduling, state management, replication, storage layers, and metadata services — at petabyte scale
Influence architectural direction for performance and scalability at the engine layer
Profile and tune JVM behavior (GC, memory layout, concurrency)
Collaborate with cross-functional engineering teams and open source committers on integrations and ecosystem work
Mentor senior engineers and raise the engineering bar through code reviews and design critiques

Requirements:

6+ years of experience in software development
Strong Java and/or Scala skills
Experience with distributed systems and concurrent or parallel programming
Working knowledge of internals of at least one Apache big data project: Spark, Flink, Trino, Ozone, Iceberg, Hive, NiFi, Kafka, Hadoop, HBase, Impala, or Kudu
Familiarity with JVM performance characteristics (GC, memory, threading)
Advanced level of English
Upstream contributions to Apache big data projects; committer or PMC status is a strong plus
Experience operating distributed systems at petabyte scale in production
Experience with adjacent or comparable engines (PrestoDB, Impala, Druid, Pinot, ClickHouse, CockroachDB)
Kubernetes and cloud-native deployment experience
Public technical presence (talks, blogs, OSS community leadership)

Distributed Systems Engineer (Apache Big Data Internals) - Remote Work | REF#294490

Key skills

About this role

Responsibilities:

Requirements: