Biohub is a non-profit research lab focused on accelerating scientific discovery through AI and advanced computing. They are seeking a Staff HPC Engineer to lead the evolution of their hybrid HPC and AI platform, integrating cutting-edge technology to support AI biology research and enhance computational capabilities.

Responsibilities:

Build and support a hybrid HPC-AI environment with large-scale on-prem compute/storage and elastic cloud GPU clusters (Coreweave, AWS, GCP)
Architect and optimize environments for large-scale AI training and tuning, and low-latency scientific workloads
Integrate MLOps and model deployment pipelines into HPC infrastructure, ensuring reproducibility and efficiency
Implement advanced resource scheduling and orchestration (Slurm, Kubernetes, SUNK) optimized for mixed HPC and AI workflows
Support researchers with job optimization, GPU utilization best practices, and performance tuning for AI and HPC applications
Evaluate, deploy, and maintain AI/ML software stacks (e.g., PyTorch, TensorFlow, Hugging Face, RAPIDS) and HPC toolchains
Ensure robust data ingest, analysis, and management capabilities for AI and HPC workloads, including integration with parallel file systems and object storage
Work with diverse science teams to translate research requirements into hardware/software solutions, from experimental design through publication
Promote best practices for AI model training, validation, and deployment in shared computing environments
Foster a culture of shared learning by running internal workshops on HPC-AI tooling (e.g., VS Code remote dev, containerization, MLOps workflows)

Staff HPC Engineer

Key skills

About this role

Responsibilities: