Role Overview

Own the AWS architecture underpinning Toqan and Toqan Claw — not as a supporting function, but as the person who defines how these systems scale.
Design the cloud infrastructure patterns that let AI workloads run reliably across 10+ operating companies.
Set the guardrails that keep the whole AI Platform team building consistently and safely.
Review infrastructure performance dashboards for Toqan and Toqan Claw — checking latency, availability, and cost signals across the OpCos currently live on the platform.
Work alongside a team of 10 engineers who are moving fast and thinking at group scale.

Requirements

Deep hands-on AWS expertise across compute, networking, storage, and managed services — you know which service to reach for and why
Proven track record of scaling infrastructure for AI or ML workloads in production, where reliability and latency are non-negotiable
Strong command of infrastructure-as-code — Terraform, CDK, or equivalent — applied at real scale, not just in greenfield projects
Experience operating in a multi-product or platform-team context, where your decisions ripple across multiple engineering teams and products
Proficiency in Go (5+ years), with a track record of building and operating production backend services; Python is a bonus
Hands-on experience integrating with multiple AI and LLM providers in production — you understand how model capabilities translate into robust, scalable backend systems
Comfortable owning CI/CD pipelines, automated test infrastructure (unit, integration, E2E), and build systems end-to-end
Systems-level thinking — you design for reliability, scalability, and performance from the start, not as an afterthought
Comfortable defining and enforcing infrastructure standards and guardrails — you've set the bar for a team, not just met it
Experience with LLM serving infrastructure — vLLM, Triton, SageMaker, or similar — is a strong plus
Familiarity with Kubernetes and container orchestration at scale is a plus
Experience building and maintaining event-sourced systems is a plus
Direct experience building MCP servers or working with Model Context Protocol is a plus

Tech Stack

AWS
Cloud
Kubernetes
Python
Terraform
Go

Benefits

Competitive compensation
Comprehensive benefits
Hybrid work setup based in Amsterdam.
Full details shared during the process.

Senior Infrastructure Engineer, AI Platform

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits