CIQ is a company that builds enterprise infrastructure to support demanding workloads across AI, HPC, and cloud-native environments. They are seeking a Principal AI Engineer to lead AI/ML innovation, focusing on model inference optimization, training workflows, and production AI deployment. The role involves designing AI solutions, collaborating with teams, and contributing to the overall AI strategy of CIQ.
Responsibilities:
- Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency
- Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion
- Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform
- Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead
- Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions
- Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies
- Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements
- Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure
- Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers
- Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods
- Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows
- Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball
- Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings
- Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike
- Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples
- Build and maintain AI-powered engineering tooling — leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization
- Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products
- Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap
- Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community
- Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment
- Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines
- Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale
- Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands
- Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer
- Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work
- Partner with product and customer success teams to translate real-world AI pain points into engineering priorities and measurable outcomes
- Document and communicate findings clearly, from low-level profiling data to executive-level summaries
- Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ’s reputation as an AI-forward infrastructure company
Requirements:
- Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management
- Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA
- Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent)
- Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling
- Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, and the intersection of HPC with modern AI workloads
- Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks
- Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work
- Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data
- Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders
- A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern
- PhD in Computer Science, Machine Learning, Computer Engineering, or a related field strongly preferred; equivalent industry experience considered
- 10+ years of industry experience in AI/ML engineering, systems software, or a closely related discipline
- Demonstrated track record of measurable, published, or production-deployed AI performance improvements at scale
- Experience working in or with open-source AI ecosystems (PyTorch, Triton, ONNX, Hugging Face, etc.) is a strong plus
- Background with cloud-native, containerized, and/or HPC computing environments preferred