Provide high-quality, timely L2.5 support for PHP applications running on EKS with MySQL backends, operating within clear guardrails that include configuration changes, feature flag operations, scripted runbooks, and safe, bounded code-level fixes.
Model a shift-left mindset: resolve more at L2.5, automate more, and escalate less, increasing the percentage of incidents resolved without L3 involvement and improving MTTR.
Participate in a healthy, sustainable on-call rotation with fair schedules, clear escalation paths, and strong post-incident learning practices.
Apply engineering discipline to operational work: use version control, code review, and testing standards for scripts, runbooks, and automation tooling you produce.
Collaborate with peers to ensure the right monitoring signals, dashboards, and alerts exist in Datadog. Tune app-level alerts and dashboards to minimize noise and surface actionable signals.
Act as a first responder for application incidents at L2.5: triage, diagnose, and remediate within guardrails (e.g., safe config changes, feature flag toggles, rolling restarts, cache purges, scripted data fixes). Support major incidents by providing technical context, structured diagnostics, Datadog/Kibana evidence, Heap impact analysis, and coordinated remediation alongside the incident commander.
Requirements
Proven experience in application support or operations engineering in cloud environments, ideally supporting PHP services running on Kubernetes (EKS) with MySQL backends.
Hands-on capability in at least one backend language (PHP preferred; Python or similar also valuable) sufficient to read, diagnose, and write safe operational scripts and minor fixes under guardrails.
Practical Kubernetes skills for operations: kubectl/Helm basics, investigating pods/deployments, reading logs/events, understanding readiness/liveness probes, and performing safe rollouts/rollbacks within documented guardrails.
MySQL operational fluency: connection and pool issues, slow query detection, query plan basics, common remediation patterns (e.g., indexing recommendations to hand to L3, safe data fixes under runbook guardrails), and understanding of replication/backup implications.
Strong experience using Datadog (APM/metrics/traces/dashboards/alerts) for investigation and detection; confident using Kibana for log exploration and correlation; ability to leverage Heap to assess user impact and prioritize remediation.
Familiarity with ITSM tooling (e.g., Jira Service Management) and ITIL-aligned incident and problem management processes.
Strong communication skills; clear, concise documentation; collaborative approach focused on reducing toil, increasing automation, and raising the quality bar.
Tech Stack
Cloud
ITSM
Kubernetes
MySQL
PHP
Python
Benefits
Annual Wellness Bonus
Monthly Edenred Electronic Food Voucher
Udemy: Access for your professional development
Flexible Holiday plan & other leave benefits
Book Benefit: Professional development books and an additional annual budget for fiction books of your choice