Databricks is a leading data and AI company that empowers teams to tackle complex challenges through innovative solutions. The Staff Backline Engineer will troubleshoot and optimize the Data and AI infrastructure, ensuring the stability and reliability of production workloads while driving product improvements and operational excellence.

Responsibilities:

Conduct deep-dive forensics into Spark core internals and the broader Databricks Data and AI ecosystem to resolve high-priority architectural failures and complex system anomalies
Perform advanced code-level analysis and resource profiling to identify and mitigate systemic root causes, ensuring the stability and reliability of high-scale production workloads
Optimise architectural performance across the Data and AI stack by refining execution parameters and enforcing best practice strategies to maximise resource efficiency and throughput
Analyse global issue trends and patterns to partner directly with Product Engineering, influencing the product roadmap and driving initiatives that enhance long-term supportability
Develop reproduction frameworks, automated workflows, and AI-driven diagnostic tools that translate complex backline findings into standardised resolution paths to empower and scale the broader organisation

Requirements:

10+ years of relevant experience
Deep expertise in one of the following three specialized tracks: Data Engineering, Product Supportability, or AI
Proven experience in managing both customers and technical stakeholders
For Data Engineering Track: Expertise in large-scale big data solutions and ETL pipelines using Spark, Delta Lake, or Hive
Strong experience troubleshooting failures, diagnosing performance issues, and identifying root causes
Demonstrated problem-solving ability and understanding of data engineering best practices
Solid hands-on programming skills in Python, SQL, or Scala
For Product Supportability Track: Deep understanding of distributed system internals
Ability to perform code-level root-cause analysis and profiling in Java, Scala, or Python
Proven record of contributing to bug fixes and mentoring other engineers
For AI Track: Experience with large-scale machine learning and generative AI systems
Strong grasp of model training, evaluation, and deployment in distributed environments
Experience managing the ML lifecycle, including governance and operationalisation
Skilled in diagnosing and optimising distributed ML workloads for performance and scalability

Staff Backline Engineer - Data & AI

Key skills

About this role

Responsibilities:

Requirements: