Accelerating your Applications with AMD GPUs

Considering porting or optimizing your code for AMD GPUs? This course will give an introduction to the AMD Instinct™ GPU architecture and its ROCm™ ecosystem, including the tools to develop or port HPC or AI applications to AMD GPUs. Participants will be introduced to the programming models for the MI200 and MI300 series GPUs and APUs. It has never been easier to program GPUs using a wide range of GPU programming models. We will cover how to use pragma-based languages such as OpenMP, the basic GPU programming language HIP, and performance portable languages such as Kokkos and RAJA. In addition, there will be presentations on other important topics such as GPU-aware MPI. The AMD tool suite, including the debugger, rocgdb, and the profiling tools rocprof, omnitrace, and omniperf will also be covered. A short introduction will be given into the AMD Machine Learning software stack including PyTorch. and Tensorflow and how they have been used in HPC.

After this course, participants will

have learned about the many GPU programming languages for AMD GPUs,
have gained knowledge about the AMD programming tools,
understand how to get performance scaling,
have been introduced to the AMD machine learning (ML) and artificial intelligence (AI) software,
know about profiling and debugging resources.

Prerequisites:

Some knowledge in GPU and/or HPC programming. Participants should have an application developer’s general knowledge of computer hardware, operating systems, and at least one HPC programming language.

Requirements:

Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.

They are also required to abide by the ARCHER2 Code of Conduct.

Timetable:

Day 1 Tuesday October 1st – Topics Covered: AMD Programming Model, OpenMP

12:45 to 13:00 Drop in time
13:00 Host Organization Intro – Bob Robey
13:10 AMD Presentation Roadmap and Introduction to System for Exercises – Bob Robey
13:20 Programming Model for MI200 and MI300 series – Giacomo Capodaglio
13:45 Programming Model Exercises
14:00 Break
14:10 Introduction OpenMP® Offloading – Johanna Potyka
14:40 OpenMP® Exercises
14:55 Break
15:10 Real-World OpenMP® Language Constructs – Shelby Lockhart
15:45 OpenMP® Language Constructs Exercises
16:00 Advanced OpenMP® – zero-copy, debugging and optimization – Samuel Antao
16:30 Advanced OpenMP® Exercises
16:50 Wrap up – Bob Robey
Day 2 Wednesday October 2nd – Topics Covered: HIP and OpenMP®/HIP interoperability
12:45 to 13:00 Drop in Time
13:00 HIP and ROCm – Giacomo Capodaglio
14:00 HIP and ROCm Exercises
14:15 Break
14:30 Porting code to HIP – Giacomo Capodaglio
14:50 Porting Exercises
15:00 OpenMP® and HIP Interoperability – Bob Robey
15:40 Interoperability Exercises
16:00 Break
16:15 Optimizing HIP Code – Gina Sitaraman
16:40 HIP Optimization Exercises
16:55 Wrap up – Gina Sitaraman
Day 3 Thursday October 3rd – Topics Covered: MPI, Kokkos, C++ StdPar
12:45 to 13:00 Drop in time
13:00 GPU-Aware MPI on AMD GPUs – Shelby Lockhart
13:30 MPI Exercises
14:00 MPI Ghost Exchange Example with MI300A – Gina Sitaraman
14:30 MPI Ghost Exchange Exercises
14:50 Break
15:00 Performance Portability Frameworks (Kokkos) – Bob Robey
15:30 Kokkos Exercises
15:50 Break
16:00 C++ Std Par – Bob Robey
16:30 C++ Standard Parallelism Exercises
16:50 Wrap up – Bob Robey
Day 4 Friday October 4th – Topics Covered: AMD Debuggers and Profiling Tools
12:45 to 13:00 Drop in time
13:00 Debugging with Rocgdb – Samuel Antao
13:40 Rocgdb Exercises
14:00 Break
14:15 GPU Timeline Profiling (Rocprof, Omnitrace) – Georgios Markomanolis and Luka Stanisic
14:55 Timeline Profiling Exercises
15:15 Break
15:30 Kernel Profiling with Omniperf – Ian Bogle
16:15 Kernel Profiling Exercises
16:45 Additional Training Resources – Bob Robey
16:55 Wrap up – Bob Robey