Research, design, and develop software components to enable hardware-agnostic AI/ML acceleration for our ESXi server product.
Work directly with GPU partners to integrate, test, and certify their Linux-based drivers and kernel components for use on our platform.
Work on packaging and release of Driver components in line with Broadcom’s established process.
Troubleshoot and address bugs related to AI/ML acceleration functionality.
Deliver software that meets the coding guidelines and quality standards set by the VCF.
Develop and maintain technical documentation for delivered features.
Work closely with the larger team, including virtual driver and device team, as well as external GPU/XPU vendors, to provide end-to-end support for ML frameworks.
Stay up-to-date with the latest GPU/XPU hardware architecture and AI/ML compiler technologies.
Requirements
Bachelor's degree in Computer Science or a related field with 8+ years of related experience, or a Master's degree with 6+ years of related experience.
Deep understanding of the Linux GPU stack, including device drivers, kernel modules, and user-space components.
Experience with C++ and Python programming languages.
Strong problem-solving skills and ability to troubleshoot complex issues.
Excellent communication and collaboration skills, specifically with external technology partners.
Experience with version control systems such as Git.
Ability to thrive in a fast-paced and dynamic work environment.
Familiarity with enterprise coding standards and best practices.
Nice to Have: Experience with ML frameworks (PyTorch, JAX) and graph/ML compiler technologies (e.g., OpenXLA).
Experience with build infrastructure using Bazel, Make, Artifactory, etc.
Experience integrating partner software into products.