AuthZed is a Series A company focused on fixing broken access control with innovative products. As a Site Reliability Engineer, you will ensure the reliability and performance of systems while designing and maintaining scalable infrastructure solutions to support a growing customer base.
Responsibilities:
- Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and customers
- Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability
- Automate infrastructure deployment and configuration management processes
- Continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning
- Troubleshoot and resolve complex infrastructure and application issues in production and test environments
- Collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure
- Participate in on-call rotation and respond to production incidents in a timely manner
- Document system configurations, troubleshooting procedures, and operational guidelines
Requirements:
- Proven experience as a Site Reliability Engineer or in a similar role
- Strong understanding of networking, operating systems, and cloud infrastructure
- Experience with Site Reliability Engineering, System Design, and Distributed Computing
- Experience in various programming languages — we currently have SDKs for NodeJS, Java, Python, Ruby, and Go
- Experience with containerization technologies such as Docker and Kubernetes
- Knowledge of infrastructure-as-code tools like Terraform and Pulumi
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
- Experience with lower-level implementation details of relational databases (bonus if you have have experience with distributed SQL databased like Google Cloud Spanner or CockroachDB)
- Experience working with Git and GitHub
- Experience with continuous integration and deployment systems
- Strong problem-solving and troubleshooting skills
- Excellent communication and collaboration abilities
- Experience with Authorization systems