CIQ is a company that builds enterprise infrastructure to support demanding workloads across AI, HPC, and cloud-native environments. They are seeking a Principal AI Engineer to lead AI/ML innovation, focusing on model inference optimization, training workflows, and production AI deployment. The role involves designing AI solutions, collaborating with teams, and contributing to the overall AI strategy of CIQ.

Responsibilities:

Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency
Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion
Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform
Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead
Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions
Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies
Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements
Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure
Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers
Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods
Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows
Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball
Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings
Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike
Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples
Build and maintain AI-powered engineering tooling — leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization
Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products
Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap
Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community
Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment
Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines
Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale
Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands
Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer
Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work
Partner with product and customer success teams to translate real-world AI pain points into engineering priorities and measurable outcomes
Document and communicate findings clearly, from low-level profiling data to executive-level summaries
Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ’s reputation as an AI-forward infrastructure company

Requirements:

Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management
Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA
Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent)
Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling
Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, and the intersection of HPC with modern AI workloads
Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks
Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work
Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data
Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders
A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern
PhD in Computer Science, Machine Learning, Computer Engineering, or a related field strongly preferred; equivalent industry experience considered
10+ years of industry experience in AI/ML engineering, systems software, or a closely related discipline
Demonstrated track record of measurable, published, or production-deployed AI performance improvements at scale
Experience working in or with open-source AI ecosystems (PyTorch, Triton, ONNX, Hugging Face, etc.) is a strong plus
Background with cloud-native, containerized, and/or HPC computing environments preferred

Principal AI Engineer

Key skills

About this role

Responsibilities:

Requirements: