Zettabyte is building the software and systems that enable large-scale AI infrastructure. They are seeking new grad software engineers to work across the entire stack, focusing on both frontend and backend development while managing GPU clusters and Kubernetes infrastructure.
Responsibilities:
- Own features end-to-end — from scoping the problem with users, to writing frontend and backend code, to deploying and monitoring in production
- Build fast, polished interfaces in Vue, Next.js, or React
- Design and build backend services in Go
- Own and operate Kubernetes infrastructure — deployments, scaling, monitoring, troubleshooting
- Build and maintain CI/CD pipelines and infrastructure-as-code
- Administer and harden Linux systems; own networking end-to-end (DNS, firewalls, load balancers, ingress, VPNs)
- Set up monitoring, alerting, and runbooks. Run incident response and post-mortems
- Dogfood our product relentlessly. Use it daily, break it on purpose, file detailed bugs, push the team to ship better software
- Automate everything. If you're doing it manually more than once, script it
- Operate cloud infrastructure (GCP, AWS, or Azure)
- Use AI dev tools (Claude Code, Codex, Cursor, etc.) to move faster and ship more
Requirements:
- 0–2 years of experience — new grads welcome, what matters is what you can do
- Strong in at least one of: frontend (JavaScript/TypeScript, Vue/React/Next.js), backend (Go, service design, APIs), or infrastructure (Linux, Kubernetes, networking)
- Comfortable working across the stack — you don't need to be an expert in everything, but you're not afraid to jump in
- Testing mindset — you enjoy breaking things and making sure code works
- Product instincts — you think about the user first and notice when something feels off
- High agency — when something is broken, you fix it; when something is missing, you build it
- Ship fast, iterate, communicate clearly
- Comfortable with ambiguity. Requirements change. Priorities shift. You thrive in that
- Own features end-to-end — from scoping the problem with users, to writing frontend and backend code, to deploying and monitoring in production
- Build fast, polished interfaces in Vue, Next.js, or React
- Design and build backend services in Go
- Own and operate Kubernetes infrastructure — deployments, scaling, monitoring, troubleshooting
- Build and maintain CI/CD pipelines and infrastructure-as-code
- Administer and harden Linux systems; own networking end-to-end (DNS, firewalls, load balancers, ingress, VPNs)
- Set up monitoring, alerting, and runbooks. Run incident response and post-mortems
- Dogfood our product relentlessly. Use it daily, break it on purpose, file detailed bugs, push the team to ship better software
- Automate everything. If you're doing it manually more than once, script it
- Operate cloud infrastructure (GCP, AWS, or Azure)
- Use AI dev tools (Claude Code, Codex, Cursor, etc.) to move faster and ship more
- Fluent in Chinese (Mandarin) — strong bonus
- Deeper Kubernetes experience — RBAC, networking policies, operating clusters in production
- Terraform, Pulumi, or other IaC tools
- Observability stacks (Prometheus, Grafana, Datadog, ELK)
- Service meshes, gRPC, or message queues (Kafka, NATS, RabbitMQ)
- Design sensibility — Figma to pixel-perfect, or skip Figma entirely
- Familiarity with AI/ML tools and integrating LLMs into products
- Bare metal or on-prem experience
- Security hardening or pen-testing background
- Contributions to open source or notable side projects
- A homelab you can tell us about in unreasonable detail