Collaborate with the Enterprise Architecture team on architectural decisions and evolve engineering standards across the Prepurchase domain.
Partner with product, security, and SRE teams to align technical decisions with business priorities.
Drive observability improvements
ensuring services are instrumented for monitoring, alerting, and rapid incident response.
Identify and eliminate single points of failure, improving system reliability and reducing on-call burden.
Apply AI and machine learning tools to improve developer productivity, automate operational tasks, and enhance system capabilities.
Evaluate and introduce new technologies that improve performance, reliability, or engineering velocity.
Requirements
Proven ability to reason about distributed systems tradeoffs
scalability, consistency, availability, latency and make defensible design decisions under real constraints.
Proven experience writing production code and building high-traffic systems at scale.
Expertise in Java (17+); and the JVM, with strong command of JVM internals, garbage collection behavior, and performance tuning under load.
Hands-on experience with a reactive, non-blocking framework (Vert.x, Spring WebFlux, or equivalent) and asynchronous, event-driven service design.
Deep experience with stream processing and event-driven architecture
Kafka Streams, Apache Flink, or equivalent.
Proven track record building high-throughput, low-latency systems where tail latency, backpressure, and memory pressure are first-class design concerns.
Strong command of gRPC (including streaming RPC) and binary serialization formats such as FlatBuffers, Avro, or Protobuf, with schema evolution via a registry.
Familiarity with Data-Oriented Design (DoD)
structuring code around how data is laid out, accessed, and transformed for cache efficiency and throughput.
Experience with compact, search-oriented data structures (e.g. RoaringBitmap, succinct/bitset representations) for representing large in-memory state efficiently.
Experience with search engines (Elasticsearch, Solr) for discovery workloads is a plus.
Strong grasp of microservice design, service mesh (Istio/Envoy), API contract evolution, and backend-for-frontend patterns, including GraphQL APIs and WebSocket subscriptions for real-time client delivery.
Hands-on experience with AWS (EKS) and cloud-native operations
containerization (Docker, Kubernetes), packaging and deploy tooling (Helm, Kustomize), and infrastructure-as-code (Terraform).
Proficiency with caching strategies (Redis, CDN layer caching) and their application to high-traffic systems.
Solid understanding of CI/CD pipelines (GitLab CI), and progressive deployment strategies (blue-green, canary).
using LLMs, AI-assisted development, and automation to accelerate engineering workflows.
Familiarity with observability tooling: Grafana, Splunk, Prometheus, OpenTracing, or equivalent.
Strong understanding of security best practices
OAuth/OIDC, input validation, secrets management.
Tech Stack
Apache
AWS
Cloud
Distributed Systems
Docker
ElasticSearch
Grafana
GraphQL
GRPC
Java
Kafka
Kubernetes
Prometheus
Redis
Splunk
Spring
Terraform
Benefits
Medical, vision, dental and mental health benefits for you and your family, with access to a health care concierge, and Flexible or Health Savings Accounts (FSA or HSA)
Free concert tickets, generous paid time off including paid holidays, sick time, and personal days
401(k) program with company match, stock reimbursement program
New parent programs including caregiver leave, plus fertility, adoption, foster, or surrogacy support
Career and skill development programs with School of Live, tuition reimbursement, and student loan repayment