Role Overview

Lead operational support for Americas customers and participate in the rotational on-call model
Perform hands-on L2/L3 troubleshooting across Linux, IMS, SIP, Diameter, Kubernetes, networking, logs, packet captures, and monitoring data
Act as the senior technical escalation point for complex customer issues
Lead major incidents from detection through recovery, setting technical direction, assigning actions, coordinating contributors, and driving resolution
Take the primary communication role during significant incidents for both internal and external stakeholders
Lead incident bridges, provide clear status updates, define communication cadence, and ensure customers, Engineering, Operations, and leadership receive consistent information
Own customer-facing incident communication, recovery confirmation, technical summaries, and RCA input
Escalate effectively to Engineering, Cloud, Product, or external partners with sufficient evidence, traces, logs, impact analysis, and investigation context
Improve monitoring, alerting, runbooks, SOPs, maintenance procedures, and operational documentation
Drive ticket quality, ownership, SLA awareness, and follow-up across the Americas queue
Mentor engineers through real investigations, reviews, shadowing, and technical feedback
Ensure reliable handovers with APAC and EMEA for ongoing incidents, maintenance, risks, and customer commitments
Support the Head of Operations & Support with technical judgement, operational risks, workload observations, and improvement recommendations

Requirements

Strong hands-on Linux administration and troubleshooting experience
Strong networking knowledge: routing, switching, firewalls, NAT, DNS, load balancing, and packet-level troubleshooting
Experience with Wireshark, tcpdump, logs, traces, and production monitoring tools
Strong practical knowledge of SIP and Diameter, including independent interpretation of call flows and signaling traces
Hands-on IMS experience, ideally with components such as P-CSCF, I-CSCF, S-CSCF, DRA, BGCF, B2BUA, HSS, PCRF/PCF, or MMTEL
Experience leading complex production incidents while remaining technically involved in the investigation
Strong customer-facing communication skills and the ability to remain clear and calm under pressure
Experience writing incident updates, RCAs, and technical documentation
Ability to work independently across timezones, take ownership, and close the loop
Fluent English, written and spoken
Strongly preferred Kubernetes and AWS operational experience
Prometheus, Grafana, Alertmanager, Loki, or similar observability tooling
Experience with VoLTE, VoWiFi, EPC, 5G, or mobile-core environments
Familiarity with Helm, Longhorn, Multus, container networking, or cloud-native telecom platforms
Experience with Jira Service Management, Confluence, and structured incident/problem management

Tech Stack

AWS
Cloud
DNS
Firewalls
Grafana
Kubernetes
Linux
Prometheus
Switching

Benefits

Work-Life-Balance is a priority: Flexible working that suits you
we live and breathe a hybrid remote culture and don't mind where and when you work
We are committed to building a diverse team that represents a variety of backgrounds, perspectives, and skills in a traditionally not very diverse industry
We offer you the culture of a fast-growing start-up with the maturity of an enterprise company
We are more interested in your experience and knowledge than formal degrees
Entrepreneurial culture and flat hierarchies
Mobility benefit, fitness benefit, language class benefit and wellness benefit (for full-time employees)
Home office budget (for full-time employees)

Lead Engineer, Operations and Support

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits