Design, build, and maintain monitoring and synthetic test suites for NG-SIEM pipeline
Engineer orchestrated scaling solutions for NG-SIEM pipeline
Serve as a subject matter expert during platform-wide incidents;
Build and refine models for end-to-end capacity forecasting
Transform manual standard operating procedures into automated remediation workflows
Partner with cell-level teams, product engineering, and external stakeholders for incident management
Identify and drive systemic improvements across teams
Requirements
10+ years of experience in software engineering, site reliability engineering, or platform engineering
Strong proficiency in at least one systems programming language (Go, Java, Rust, or C++) and one scripting language (Python, Bash)
Deep experience with end-to-end observability — building monitoring pipelines, defining SLIs/SLOs, and creating dashboards that drive actionable insights
Demonstrated ability to diagnose and resolve complex incidents spanning multiple distributed components operating 24/7
Experience with coordinated capacity planning and scaling for systems with significant infrastructure footprints
Hands-on experience with streaming platforms (Kafka or similar)
Familiarity with infrastructure-as-code, CI/CD pipelines, and automated deployment practices
Strong written and verbal communication skills
Tech Stack
Java
Kafka
Python
Rust
Go
Benefits
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections