Research, design, and develop the AI Virtualization Stack for our ESXi server product.
Implement and optimize PyTorch and JAX backends using the OpenXLA framework to ensure high-performance AI/ML workload execution across GPUs and XPUs.
Analyze and re-architect performance-critical sections of the ML acceleration code, focusing on optimization techniques for LLM inference such as KV-caching and FlashAttention.
Troubleshoot and address bugs related to AI/ML acceleration functionality.
Deliver software that meets the coding guidelines and quality standards set by the VCF.
Develop and maintain technical documentation for delivered features.
Work closely with the larger team, including virtual driver and device team, as well as external GPU/XPU vendors, to provide end-to-end support for ML frameworks.
Stay up-to-date with the latest GPU/XPU hardware architecture and AI/ML compiler technologies.
Requirements
Bachelor's degree in Computer Science or related field and 12+ years of related experience or Masters degree and 10+ years of related experience.
5+ years of experience in ML framework/runtime development, GPU/XPU backend engineering.
Strong understanding and direct experience with ML frameworks (PyTorch, JAX) and graph/ML compiler technologies (e.g. OpenXLA).
Experience with C++ and Python programming languages.
Strong problem-solving skills and ability to troubleshoot complex issues.
Experience with version control systems such as Git.
Familiarity with enterprise coding standards and best practices.
Must have legal authorization to work in the US.
Tech Stack
Python
PyTorch
Benefits
Medical, dental and vision plans
401(K) participation including company matching
Employee Stock Purchase Program (ESPP)
Employee Assistance Program (EAP)
Company paid holidays
Paid sick leave and vacation time
Company follows all applicable laws for Paid Family Leave and other leaves of absence