AWSCloudDockerEC2LinuxMariaDBPostgresPythonShell ScriptingTCP/IPShellAIGitHub ActionsECSRDSCloudFrontRoute53IAMCloudWatchPostgreSQLDatadogGitGitHubVersion Control
About this role
Role Overview
Build, Maintain and improve a modern, highly available, scalable, and secure cloud infrastructure on AWS to support our next generated of AI-powered products.
Use AI tools to empower your productivity and make you more successful
Optimize system performance, troubleshoot complex issues, and implement proactive monitoring solutions.
Automate infrastructure deployments and operational tasks to improve efficiency and reliability.
Collaborate with cross-functional teams to ensure seamless integration and delivery of our products.
Contribute to the continuous improvement of our infrastructure and operational processes.
Requirements
A proven track record of 5+ years managing large-scale production systems.
At least 3+ years of experience developing solutions and improvements for large-scale systems.
Experience working in a global, distributed team environment.
Platform Engineering
Strong proficiency in Python for scripting and automation tasks.
Highly experienced with Docker and containerised applications and systems
Proficiency in POSIX shell scripting, including skills in using awk, sed, grep, etc.
Strong version control skills using Git.
Experienced using GitHub Actions and Workflows
Cloud Operations
Advanced Linux skills including iptables, systemd, logging, application management, diagnosis, sysctl tweaking, etc.
Strong experience with AWS services including EC2, ALB/NLB, VPCs, IAM, RDS, ECS, Route53, and CloudFront.
Experience maintaining large-scale legacy systems and performing migrations.
Experience with large-scale databases and database administration, ideally MariaDB & PostgreSQL on Amazon RDS
Experienced with metrics/monitoring systems, ideally Datadog and CloudWatch.
Nice to Have:
Strong networking background, including understanding of TCP/IP and routing protocols
Experience in optimising database performance and resolving performance issues.