Serve as the team's database expert, the first person to investigate, diagnose, and resolve complex performance problems across our production database systems (MongoDB, OpenSearch, PostgreSQL, Cassandra).
Perform deep-dive root cause analysis on database performance issues, understanding query execution internals, resource consumption patterns, cluster behavior, and system-level interactions to identify the real source of problems, not just symptoms.
Design and propose better database architectures and solutions, recommending when to re-architect data models, migrate workloads, introduce new technologies, or redesign how services interact with their data layer.
You will put in every effort within the team to ensure the data architecture is well designed.
Own capacity planning, scaling strategies, and high-availability designs for database clusters, ensuring systems are built to handle the team's growth trajectory.
Act as the bridge between development and infrastructure, advising engineers on how their application patterns impact database performance and guiding them toward sustainable solutions.
Build and maintain CI/CD pipelines, infrastructure-as-code (Terraform, Helm, Kubernetes manifests), and automated deployment workflows for the xspm team's services.
Design and manage observability stacks, dashboards, alerting rules, and SLOs, to maintain best-in-class availability for critical data pipelines and services.
Drive infrastructure automation to reduce operational toil, including automated scaling, self-healing systems, and configuration management.
Participate in on-call rotations, incident response, and post-incident reviews, driving root-cause analysis and long-term reliability improvements.
Evaluate and adopt new database technologies and infrastructure tooling that align with the team's evolving data architecture needs.
Requirements
7+ years experience in DevOps, SRE, DBA, or infrastructure engineering, with significant hands-on responsibility for production database systems at scale.
Expert-level knowledge of a common DB such as MongoDB, OpenSearch, PostgreSQL, Cassandra, deep understanding of its internals, performance characteristics, replication, sharding, and the ability to diagnose and solve complex issues from first principles.
A problem-solver's mindset — you don't stop at "the database is slow." You investigate why, trace the root cause across application, query, and infrastructure layers, and design the fix.
Strong experience with at least one additional database technology (PostgreSQL, Cassandra, Redis, or similar).
Proficiency in at least one programming/scripting language (Python, Go, etc) for building automation, tooling, and operational scripts.
Deep experience with containerized environments (Kubernetes, Docker) and cloud infrastructure on AWS, Azure, or GCP.
Strong knowledge of infrastructure-as-code tools (Terraform, Ansible, or equivalent).
Experience designing and operating CI/CD pipelines (Jenkins, GitLab CI, ArgoCD, or similar).
Solid understanding of networking, Linux systems administration, and security hardening practices.
Bachelor's degree or equivalent work experience in a relevant field.
Tech Stack
Ansible
AWS
Azure
Cassandra
Cloud
Docker
Google Cloud Platform
Jenkins
Kubernetes
Linux
MongoDB
Postgres
Python
Redis
Terraform
Go
Benefits
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections