Platform Resiliency Lead (DR and BCRM)
US (Remote in US)
Long Term
Contract

The Platform Resiliency Lead is accountable for ensuring that enterprise digital platforms are designed, implemented, tested, and operated with appropriate resilience, disaster recovery, and reliability controls aligned to business criticality. The role partners closely with Platform Owners, Engineering, Infrastructure, Security, and Business Continuity teams to reduce the risk and impact of technology disruptions and to ensure recovery objectives are achievable and tested.

This role plays a key leadership position in driving Business Technology Resilience (BTR) outcomes, including application impact assessments, disaster recovery planning, testing, and continuous improvement across platform portfolios. Ensures end to end testing of supported applications.

- Ensure compliance with Mars BTR and DR goals and that platform reliability governance is embedded into dev and deploy processes and operating model, and continuously improve it.

- Ensure working are consistent, measurable, audit-ready, and friction-minimized across engineering and platform/tooling teams to support system availability required by business at minimal cost and maximum output.

- Help establish "guardrails not gates" by embedding governance into all workflows so teams can move at speed while maintaining strong control posture when complying with Mars and industry frameworks.

- Ensure development tools and pipelines align to Mars standards and secure coding/shift-left principles. Disaster Recovery & Preparedness

Ensure Application and Platform Disaster Recovery Plans (A/PDRPs) are created, maintained, and reviewed in line with platform criticality and policy expectations.

Govern execution of disaster recovery testing, including scenario-based exercises and validation of achieved RTO/RPOs, with clear evidence capture and remediation tracking.

Coordinate with Infrastructure, Network, Security, and Platform teams to confirm dependencies, recovery sequencing, and operational readiness during DR events.

Incident & Recovery Leadership

Provide resilience leadership during major incidents and disaster recovery events, supporting command-and-control execution and decision making.

Ensure lessons learned from incidents, DR tests, and near misses are translated into measurable improvements in platform design, tooling, and processes.

Continuous Improvement & Enablement

Track and report resilience maturity, adherence, and gaps across platforms using standardized metrics and compliance reporting.

Partner with engineering and SRE teams to strengthen reliability practices such as observability, failover, backup, and controlled recovery mechanisms.

Promote resilience awareness and enablement across platform and delivery teams through guidance, templates, and training.

Key Stakeholders

Platform & Product Owners

Infrastructure, Cloud, Network, and Security teams

Business Continuity Management (BCM)

Enterprise Architecture and Risk Management

Internal Audit and External Assurance Partners

Required Experience & Skills

Experience

Strong experience in platform reliability, disaster recovery, or resilience engineering within large-scale digital or cloud environments.

Proven delivery of resilience or DR programs across multiple platforms or applications with differing criticality.

Experience operating in regulated or audit-intensive environments.

Technical & Functional Skills

Deep understanding of DR concepts (RTO, RPO, backup strategies, recovery sequencing).

Familiarity with cloud and SaaS recovery patterns and dependency management.

Ability to translate technical resilience topics into clear risk, impact, and investment discussions for non technical stakeholders.

Leadership & Communication

Influences without direct authority; able to align diverse teams around resilience outcomes.

Strong documentation and governance discipline suitable for audit and regulatory scrutiny.

Comfortable operating in high pressure incident or recovery scenarios.

Success Measures

Platforms meet or exceed defined resilience and recovery objectives.

Disaster recovery plans are complete, tested, and evidenced for critical platforms.

Reduction in unmanaged resilience risks and audit findings.

Improved recovery readiness and confidence across platform delivery teams.

Munesh

CYBER SPHERE LLC

Platform Resiliency Lead (DR and BCRM)

Key skills

About this role