Considering porting or optimizing your code for AMD GPUs? This course will give an introduction to the AMD Instinct™ GPU architecture and its ROCm™ ecosystem, including the tools to develop or port HPC or AI applications to AMD GPUs. Participants will be introduced to the programming models for the MI200 and MI300 series GPUs and APUs. It has never been easier to program GPUs using a wide range of GPU programming models. We will cover how to use pragma-based languages such as OpenMP, the basic GPU programming language HIP, and performance portable languages such as Kokkos and RAJA. In addition, there will be presentations on other important topics such as GPU-aware MPI. The AMD tool suite, including the debugger, rocgdb
, and the profiling tools rocprof
, omnitrace
, and omniperf
will also be covered. A short introduction will be given into the AMD Machine Learning software stack including PyTorch. and Tensorflow and how they have been used in HPC.
After this course, participants will
- have learned about the many GPU programming languages for AMD GPUs,
- have gained knowledge about the AMD programming tools,
- understand how to get performance scaling,
- have been introduced to the AMD machine learning (ML) and artificial intelligence (AI) software,
- know about profiling and debugging resources.
Prerequisites:
Some knowledge in GPU and/or HPC programming. Participants should have an application developer’s general knowledge of computer hardware, operating systems, and at least one HPC programming language.
Requirements:
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.
They are also required to abide by the ARCHER2 Code of Conduct.
Timetable:
Day 1 Tuesday October 1st – Topics Covered: AMD Programming Model, OpenMP
- 12:45 to 13:00 Drop in time
- 13:00 Host Organization Intro – Bob Robey
- 13:10 AMD Presentation Roadmap and Introduction to System for Exercises – Bob Robey
- 13:20 Programming Model for MI200 and MI300 series – Giacomo Capodaglio
- 13:45 Programming Model Exercises
- 14:00 Break
- 14:10 Introduction OpenMP® Offloading – Johanna Potyka
- 14:40 OpenMP® Exercises
- 14:55 Break
- 15:10 Real-World OpenMP® Language Constructs – Shelby Lockhart
- 15:45 OpenMP® Language Constructs Exercises
- 16:00 Advanced OpenMP® – zero-copy, debugging and optimization – Samuel Antao
- 16:30 Advanced OpenMP® Exercises
- 16:50 Wrap up – Bob Robey
-
Day 2 Wednesday October 2nd – Topics Covered: HIP and OpenMP®/HIP interoperability
- 12:45 to 13:00 Drop in Time
- 13:00 HIP and ROCm – Giacomo Capodaglio
- 14:00 HIP and ROCm Exercises
- 14:15 Break
- 14:30 Porting code to HIP – Giacomo Capodaglio
- 14:50 Porting Exercises
- 15:00 OpenMP® and HIP Interoperability – Bob Robey
- 15:40 Interoperability Exercises
- 16:00 Break
- 16:15 Optimizing HIP Code – Gina Sitaraman
- 16:40 HIP Optimization Exercises
- 16:55 Wrap up – Gina Sitaraman
-
Day 3 Thursday October 3rd – Topics Covered: MPI, Kokkos, C++ StdPar
- 12:45 to 13:00 Drop in time
- 13:00 GPU-Aware MPI on AMD GPUs – Shelby Lockhart
- 13:30 MPI Exercises
- 14:00 MPI Ghost Exchange Example with MI300A – Gina Sitaraman
- 14:30 MPI Ghost Exchange Exercises
- 14:50 Break
- 15:00 Performance Portability Frameworks (Kokkos) – Bob Robey
- 15:30 Kokkos Exercises
- 15:50 Break
- 16:00 C++ Std Par – Bob Robey
- 16:30 C++ Standard Parallelism Exercises
- 16:50 Wrap up – Bob Robey
-
Day 4 Friday October 4th – Topics Covered: AMD Debuggers and Profiling Tools
- 12:45 to 13:00 Drop in time
- 13:00 Debugging with Rocgdb – Samuel Antao
- 13:40 Rocgdb Exercises
- 14:00 Break
- 14:15 GPU Timeline Profiling (Rocprof, Omnitrace) – Georgios Markomanolis and Luka Stanisic
- 14:55 Timeline Profiling Exercises
- 15:15 Break
- 15:30 Kernel Profiling with Omniperf – Ian Bogle
- 16:15 Kernel Profiling Exercises
- 16:45 Additional Training Resources – Bob Robey
- 16:55 Wrap up – Bob Robey