Jitsu is a company that provides innovative logistics solutions for same-day and next-day delivery. They are seeking a Senior DevOps/Platform Engineer to enhance their infrastructure and tools, ensuring continuous delivery and operational efficiency for their platform.
Responsibilities:
- Own key parts of the DevOps effort for a global engineering team and a platform supporting millions of transactions per day — and more than doubling annually
- Drive improvement across our development process and tooling: source control, build, test, packaging, release, and deployment
- Drive improvement in our infrastructure configuration, management, and cost efficiency
- Drive enhancement of monitoring and observability across all infrastructure and services, for both availability and performance
- Partner with other technical leaders to improve our architecture’s maintainability, scalability, and resilience
- Serve as a technical lead to software and DevOps engineers on development and operations practices
- Partner with other technical leaders to strengthen our information security posture
- Contribute to the team’s development and operations standards and processes
Requirements:
- Deep experience designing automated pipelines end to end — git-based source control and branching strategies, build and test automation (Jenkins, GitHub Actions, or similar), artifact and dependency management, and progressive, low-risk deployment patterns
- Strong, hands-on experience with Terraform (or similar) — reusable modules, state management, and managing multi-environment infrastructure (staging, beta, production) as code
- Experience operating a GitOps workflow with ArgoCD (or similar) as the source of truth for declarative, auditable deployments
- Strong experience building and operating cloud infrastructure at scale — compute, VPC networking, storage, message queues, serverless, DNS, load balancing, IAM, and logging
- Production experience running and operating Kubernetes at scale — cluster lifecycle and upgrades, workload scheduling and resource management, autoscaling (HPA/cluster/event-driven), networking and ingress, and diagnosing complex cluster issues
- Authoring, configuring, and maintaining Helm charts for templated, repeatable application deployments across environments
- Experience deploying, operating, and tuning a mix of data stores — relational (PostgreSQL / CloudSQL), NoSQL document (MongoDB), wide-column (Cassandra), and cache (Redis) — including replication, backups, scaling, and performance troubleshooting
- Demonstrated track record deploying and monitoring large-scale, mission-critical services — defining SLIs/SLOs, building actionable alerting, and driving incident response and blameless post-mortems
- Solid grounding in cloud and infrastructure security — IAM, secrets management, network policy, and supply-chain hygiene
- Great communication and documentation skills
- An obsession with automation and a desire to leave things better than you found them
- A customer-first mindset and strong attention to detail
- 5+ years as a DevOps or Site Reliability Engineer
- Java application release engineering
- AI-driven SRE & DevOps mindset — familiarity with AIOps (anomaly detection, intelligent alerting, predictive scaling, automated remediation) and comfort applying AI/LLMs to simplify how we operate infrastructure, pipelines, and Kubernetes
- Building AI agents and automation that reduce operational toil — incident triage, runbook automation, log/RCA summarization, and AI-assisted CI/CD that moves us faster and toward self-healing infrastructure
- A sense of humor