Astronomer empowers data teams with its unified DataOps platform powered by Apache Airflow®. The role involves owning and developing platform infrastructure strategy, building foundational systems, and ensuring reliability and scalability for Astronomer’s products.
Responsibilities:
- Own and develop our platform infrastructure strategy, with the sponsorship and responsibility to match
- Map out what we need, make the calls, and own the outcomes
- Be directly involved in deciding what we work on and how we work on it
- Make promises, and keep them
- Make principled build vs. buy assessments and advocate for the right tools for the right job — not the fashionable ones, not the ones already in the estate just because they’re there
- Create and maintain comprehensive internal documentation and decision records for systems and processes
- Participate in architectural forums and make principled, open decisions that the rest of the organisation can learn from and hold us to
Requirements:
- Distributed systems depth, grounded in practice. You have a solid working model of how production systems fail — consistency and availability tradeoffs, failure cascades, backpressure, graceful degradation. You can draw the diagram, explain the failure modes at each node, and make a reasoned argument for which ones actually matter in a given context. NALSD thinking is how you naturally approach a new system design
- Kubernetes at operator depth. You know what happens inside the scheduler and the control loop when things go wrong, because you've been there. You've operated clusters under real load, not just deployed workloads onto them
- Strong Go proficiency. The platform team writes production Go. You should be fluent: you've built and shipped systems in it, and you have opinions about what good Go looks like
- Multi-cloud experience, not just multi-cloud exposure. You've made considered architectural decisions across AWS, GCP, and/or Azure — not just consumed managed services, but evaluated tradeoffs between them and lived with those decisions in production
- Experience defining requirements and driving technology choices across an engineering organisation. You've been the person in the room who frames the decision correctly, not just the one who executes it
- Strong written and verbal communication. You can write a design doc that changes minds, and a postmortem that makes the organisation smarter. You've worked effectively in a globally-distributed team
- Experience with storage primitives at the system level — you've reasoned about when to reach for a relational store vs. an object store vs. something else, and you have real opinions informed by real failures
- Experience working on a SaaS/PaaS product across multiple cloud providers
- Familiarity with Apache Airflow or workflow orchestration systems