GPU Programming with CUDA

Please note:

This is not an ARCHER2 course but is advertised here as likely to be of interest to many members of the ARCHER2 commnunity.

This course will run on Cirrus, not ARCHER2.

Location:

This course will take place face-to-face at The Open University, Milton Keynes

This course will not be streamed online and a recording will not be made.

Outline

This short course will provide an introduction to GPU computing with CUDA aimed at scientific application programmers. The course will give a background on the difference between CPU and GPU architectures as a prelude to introductory exercises in CUDA programming. The course will discuss the execution of kernels, memory management, and shared memory operations. Common performance issues are discussed and their solution addressed. The course will also cover some of the alternatives to CUDA commonly available (OpenCL, OpenACC, and Kokkos) at the current time.

Templates will be provided to do the practical examples in Python using PyCUDA, although this still requires the computational kernels to be written in C.

Note: this course will not address machine learning or any machine learning frameworks.

Learning Outcomes

At the end of the course, attendees should be in a position to make an informed decision on how to approach GPU parallelisation in their applications in an efficient and portable manner.

Pre-requisites

Attendees must be familiar with programming in C or C++ (a number of the baseline CUDA exercises are also available using CUDA Fortran). Some knowledge of parallel/threaded programming models would be useful. Access to a GPU machine will be supplied.

Requirements:

Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.

They are also required to abide by the ARCHER2 Code of Conduct.

Timetable:

Provisional

10:00 Introduction
10:20 GPU Concepts/Architectures
11:00 Break
11:20 CUDA Programming
12:00 A first CUDA exercise
13:00 Lunch
14:00 CUDA Optimisations
14:20 Optimisation Exercise
15:00 Break
15:20 Constant and Shared Memory
16:00 Exercise
17:00 Close

Course materials

Course materials

Course Chat

Feedback

Feedback
Please let us know what was great about this course and anything we can improve

This course is part-funded by the PRACE PRACE project and is free to all.