Astronomer empowers data teams with its unified DataOps platform powered by Apache Airflow®. The role involves owning and developing platform infrastructure strategy, building foundational systems, and ensuring reliability and scalability for Astronomer’s products.

Responsibilities:

Own and develop our platform infrastructure strategy, with the sponsorship and responsibility to match
Map out what we need, make the calls, and own the outcomes
Be directly involved in deciding what we work on and how we work on it
Make promises, and keep them
Make principled build vs. buy assessments and advocate for the right tools for the right job — not the fashionable ones, not the ones already in the estate just because they’re there
Create and maintain comprehensive internal documentation and decision records for systems and processes
Participate in architectural forums and make principled, open decisions that the rest of the organisation can learn from and hold us to

Requirements:

Distributed systems depth, grounded in practice. You have a solid working model of how production systems fail — consistency and availability tradeoffs, failure cascades, backpressure, graceful degradation. You can draw the diagram, explain the failure modes at each node, and make a reasoned argument for which ones actually matter in a given context. NALSD thinking is how you naturally approach a new system design
Kubernetes at operator depth. You know what happens inside the scheduler and the control loop when things go wrong, because you've been there. You've operated clusters under real load, not just deployed workloads onto them
Strong Go proficiency. The platform team writes production Go. You should be fluent: you've built and shipped systems in it, and you have opinions about what good Go looks like
Multi-cloud experience, not just multi-cloud exposure. You've made considered architectural decisions across AWS, GCP, and/or Azure — not just consumed managed services, but evaluated tradeoffs between them and lived with those decisions in production
Experience defining requirements and driving technology choices across an engineering organisation. You've been the person in the room who frames the decision correctly, not just the one who executes it
Strong written and verbal communication. You can write a design doc that changes minds, and a postmortem that makes the organisation smarter. You've worked effectively in a globally-distributed team
Experience with storage primitives at the system level — you've reasoned about when to reach for a relational store vs. an object store vs. something else, and you have real opinions informed by real failures
Experience working on a SaaS/PaaS product across multiple cloud providers
Familiarity with Apache Airflow or workflow orchestration systems

Staff Software Engineer, Platform Infrastructure

Key skills

About this role

Responsibilities:

Requirements: