Advanced Micro Devices, Inc (AMD) is committed to building innovative products that enhance next-generation computing experiences. The role involves a Senior level engineer responsible for driving the strategy, architecture, optimization, and tooling for AI performance on AMD GPUs, collaborating across various teams to achieve optimal performance across the software stack.
Responsibilities:
- Help set strategy and roadmap for AMD Collectives and Network optimizations
- Provide guidelines to customers on efficient network load-balancing, workload scheduling and model sharding strategies
- Performance tuning, profiling and analysis of large-scale models for LLM, diffusion, multimodal, RecSys and generative AI, single node and distributed. In addition to exploring various tradeoffs and design decisions
- Participate in hardware-software co-design for future hardware optimizations – especially on scale-up networks, NIC and scale-out networks
- Develop and improve framework, tools and infrastructure for performance estimation, modeling and reporting
- Communicate and present the results of the performance analysis and modeling to stakeholders, and senior leadership. And provide a concrete recommendation
- Cross team collaboration and working across the organization to identify opportunities and develop strategies