Docusign is a leading company in e-signature and contract lifecycle management, transforming into an Intelligent Agreement Management platform. The Principal Software Engineer will lead the evolution of the Site Reliability Engineering organization, focusing on building reliable systems and mentoring other engineers.

Responsibilities:

Lead and code with the team
Lead the cultural and technical shift toward treating reliability as a product feature
Move the org away from reactive "ops" work toward building durable platforms and self-healing systems
Possess elite Incident Commander skills while not expected to be in the daily on-call rotation, stepping in during high-stakes outages to bring calm and clarity, and use those experiences to architect systems that ensure those incidents never happen again
Define the "Golden Paths" for our Cloud migration, ensuring that as Docusign scales globally, our architecture remains "Multi-Active" and impervious to regional cloud failures
Challenge the status quo, mentoring Senior and Staff SREs to think like software architects
Advocate for "Error Budgets" that have real teeth, influencing product roadmaps to prioritize long-term stability

Requirements:

15+ years of experience in large-scale distributed systems, software engineering, or infrastructure roles, with a track record of driving system architecture
Experience as a software engineer by trade with deep proficiency in Go or Python, possessing a 'code-first' approach and a passion for writing production-grade automation services alongside the engineering team
Experience with proven technical leadership in building global, active-active distributed systems at hyperscale, functioning simultaneously as an architect and an engineering peer
Experience with production-hardened mastery of Kubernetes and Terraform to manage complex, multi-tenant cloud topographies
Experience acting as a primary Lead Incident Commander for tier-0 global outages, with the ability to translate operational chaos into actionable technical stabilization
Experience defining 'Developer Experience' strategies and contributing to Internal Developer Platforms (IDPs) that bake resilience and infrastructure abstractions directly into developer workflows
Technical expertise executing high-stakes on-premises to cloud migrations natively within Microsoft Azure (specifically utilizing Azure Kubernetes Service / AKS and Azure traffic routing)
Hands-on experience architecting global distributed tracing capabilities using the OpenTelemetry ecosystem to track deep, user-centric SLO metrics across microservices
Experience developing self-healing infrastructure patterns through a blend of deterministic code and AI-assisted/predictive anomaly remediation models
Experience championing and setting up automated fault-injection frameworks to proactively prove system recoverability before a real production blast radius occurs
Experience building safe deployment architectures (Canary, Blue/Green) managed via secure pipelines (GitHub Actions, Azure DevOps) with automated safety policies embedded directly into the code lifecycle

Principal Software Engineer

Key skills

About this role

Responsibilities:

Requirements: