WEX is transforming how businesses operate by embedding advanced AI into their global payments and mobility platforms. The Senior Software Engineer - AI Platform will be responsible for building and operating the cloud and DevOps foundations of WEX’s AI platform, owning CI/CD workflows and infrastructure automation across AWS and Azure.
Responsibilities:
- Design, implement, and maintain cloud-native infrastructure for AI and LLM-based applications on AWS and Azure, including networking, compute, IAM, and observability
- Build and manage CI/CD and GitOps workflows using tools such as Fabric, Argo CD, and related DevOps tooling to enable reliable, automated deployments
- Develop production-grade services, utilities, and platform components in Python, including APIs, and automation scripts that support AI workloads
- Implement and maintain Infrastructure-as-Code (IaC) for repeatable environments and services (e.g., Terraform, CloudFormation, or similar tools)
- Collaborate with AI/ML engineers and product engineering teams to integrate LLMs and agentic AI capabilities into WEX’s products using secure, scalable patterns
- Operates as a system administrator-level engineer for key cloud services (logging, monitoring, secrets, container orchestration, storage, and security controls)
- Champion AI-native development practices by using and promoting tools such as Cursor, GitHub Copilot, and other AI-assisted IDEs to increase velocity and code quality
- Contribute to and refine platform standards for security, compliance, observability, and resiliency across AI services and pipelines
- Participate in on-call rotations and incident response for AI platform components, driving root-cause analysis and continuous improvement
Requirements:
- 7+ years of professional software engineering or DevOps/SRE experience, with at least 3+ years working in cloud-native, production environments
- Strong experience with DevOps and GitOps practices, including CI/CD pipelines, automated testing, and environment promotion workflows
- Hands-on experience with Fabric and Argo CD (or similar GitOps tools) for deployment automation and environment management
- Deep, system-administrator-level experience with AWS (preferred) and practical experience with Azure services (compute, networking, IAM, logging, containers)
- Proficiency in Python, including writing robust, maintainable code for services, tools, and automation, and comfort with standard Python development workflows
- Familiarity with containerization and orchestration (Docker, Kubernetes, ECS, AKS, or similar)
- Experience using AI-native development tools (e.g., Cursor, GitHub Copilot, or similar AI-assisted IDEs) as part of your everyday engineering workflow
- Solid understanding of monitoring, logging, and alerting practices and tools (e.g., CloudWatch, Prometheus, Grafana, Datadog, or similar)
- Demonstrated ability to work in agile, collaborative, high-trust engineering teams and to partner effectively with cross-functional stakeholders
- Bachelor's degree in Computer Science, Engineering, or a related discipline, or equivalent practical experience
- Experience supporting LLM/agentic AI workloads in production (e.g., prompt orchestration, model gateways, vector stores, RAG systems)
- Background in financial systems, data compliance, or regulated industries (payments, healthcare, etc.)
- Exposure to Terraform or other IaC tools for multi-cloud environments
- Experience building or operating multi-tenant platforms and shared services
- Create, operate, and manage MCP tools and MCP gateways to securely connect AI agents to external systems, enforce governance, and centralize tool access and observability