Duetto is the industry-leading hospitality revenue management system, helping hotels, resorts, and casinos optimize revenue and boost profit. The Senior Site Reliability Engineer will design, implement, and maintain scalable systems, ensuring high availability and security while collaborating with cross-functional teams.

Responsibilities:

Architect and implement infrastructure solutions to facilitate seamless migration of critical systems while ensuring uptime, reliability, and a high-quality experience for end users
Design, develop, test, and maintain tools and processes to efficiently manage and operate SaaS products hosted on AWS, with a focus on scalability and automation
Partner with developers to enhance the reliability, performance, scalability, and security of server and application architectures
Build and maintain critical components of our infrastructure, emphasizing robustness, security, and high availability to meet demanding service-level expectations
Foster strong cross-team collaboration by driving engagement, promoting shared goals, and ensuring alignment across technical and non-technical teams
Lead efforts to ensure systems are secure by default, addressing vulnerabilities proactively and implementing best practices for cybersecurity preparedness
Be willing to learn and adopt AI in DevOps/SRE workflows
Be the last line of support for services that thousands of customers (hotels, resorts, casinos, etc.) around the world depend on 24/7
Troubleshoot on-call incidents to ensure rapid resolution and minimal service disruption. Participate in detailed Root Cause Analysis (RCA) to identify underlying issues and work cross-functionally to implement preventative measures and long-term solutions, ensuring similar problems are avoided in the future

Requirements:

5+ years of experience in an Ops, DevOps or SRE role
Experience in System Design and Architecture
Engineer-level experience with networking and security concepts
Understanding of fundamentals behind load balancing technologies. Experience configuring Layer 7 load-balancing is a plus
Experience collaborating with engineers on architecture decisions
Experience administering Cloud Computing Services such as AWS (preferred), Azure, or GCP, including working knowledge of permissions structures, multi-account management structures, and single sign-on(SSO)
Experience with AWS ecosystem tools such as AWS IAM, VPC, EC2, ELB, RDS, S3, Lambda, API Gateway, Secrets Manager, KMS, CloudWatch, CloudTrail
Experience with security compliance certifications such as SOC2
Experience working in an environment with a heavy emphasis on DevOps and Service Reliability mindset
Experience provisioning, configuring, administering, and using enterprise monitoring ecosystems like Prometheus, Grafana, DataDog or similar
Experience with CI/CD Tools such as GitHub, GitHub Actions, JFrog Artifactory, Jenkins, and GitOps methodologies
Experience using and writing infrastructure-as-code using Terraform
Experience with configuration-management toolsets such as Chef or Puppet
Experience with containers and container orchestration tools such as ECS/EKS (a plus)
Experience managing infrastructure and contributing as part of a multi-user infrastructure team, using Terraform and associated toolsets. Relevant SOC2 experience is also a plus
Fluency in reading Java, Ruby, Bash/Zsh, HCL, Python and Javascript
Strong experience in troubleshooting and resolving complex on-call incidents with a focus on minimizing service disruption and downtime
Proven ability to lead and participate in detailed Root Cause Analysis (RCA) processes to identify and address underlying issues effectively
Demonstrated expertise in implementing preventative measures and long-term solutions based on RCA findings to ensure recurring issues are mitigated
Experience constructing and maintaining build/deploy automation tooling
Participate in weekly on-call rotation
Ability to work both independently and within a team environment
A passion for technology with a drive to stay up to date with technology and best practices

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: