AWSAzureCloudDistributed SystemsGoogle Cloud PlatformPrometheusPythonTerraformLarge Language ModelsAgenticUnit TestingGCPGoogle CloudCloudFormationCloudWatchOAuthSAMLLeadershipMentoringCollaboration
About this role
Role Overview
To design, develop and improve software, utilising various engineering methodologies, that provides business, platform, and technology capabilities for our customers and colleagues.
Development and delivery of high-quality software solutions by using industry aligned programming languages, frameworks, and tools.
Ensuring that code is scalable, maintainable, and optimized for performance.
Cross-functional collaboration with product managers, designers, and other engineers to define software requirements, devise solution strategies, and ensure seamless integration and alignment with business objectives.
Collaboration with peers, participate in code reviews, and promote a culture of code quality and knowledge sharing.
Stay informed of industry technology trends and innovations and actively contribute to the organization’s technology communities to foster a culture of technical excellence and growth.
Adherence to secure coding practices to mitigate vulnerabilities, protect sensitive data, and ensure secure software solutions.
Implementation of effective unit testing practices to ensure proper code design, readability, and reliability.
Requirements
Good SRE experience with experience implementing reliability practices, defining SLOs/error budgets, uptime and supporting incident response for multi-layered distributed systems.
Proficient in cross region cloud platforms (AWS/Azure/GCP), observability tools (Prometheus, CloudWatch), and automation frameworks with experience of reducing toil.
Solid infrastructure as code skills (Terraform, CloudFormation) with experience building self-service platforms, tooling, and deployment pipelines for engineering teams.
Demonstrated ability in capacity planning, performance tuning, and cost optimisation with experience managing production systems at significant scale.
Experience in observability and monitoring, metrics collection, logging, distributed tracing and dashboard creation.
Security experience including encryption, secrets management, authentication and authorisation standards including OAuth, SAML and API Key Management.
Programming and/or scripting experience with Python for automation and monitoring.
Experience with modern testing approaches including chaos and performance testing.
Understanding of Large Language Models and Agentic frameworks.
Self-starter with leadership experience mentoring SREs, running post-mortems, and driving reliability improvements across engineering teams with good collaboration skills.
Bachelor’s degree in computer science or related field with depth in systems engineering, networking, and distributed systems design.