This course will provide an introduction to GPU computing with CUDA aimed at scientific application programmers. The course will give a background on the difference between CPU and GPU architectures as a prelude to introductory exercises in CUDA programming. The course will discuss the execution of kernels, memory management, and shared memory operations. Common performance issues are discussed and their solution addressed. The course will also cover some of the alternatives to CUDA commonly available (OpenCL, OpenACC, and Kokkos) at the current time.

A separate “Hackathon Day” will be available for attendees to try out their own problems (or a ‘canned’ extended example) with the help of staff from both EPCC and NVIDIA.

Learning Outcomes

At the end of the course, attendees should be in a position to make an informed decision on how to approach GPU parallelisation in their applications in an efficient and portable manner.


Attendees must be familiar with programming in C or C++ (a number of the baseline CUDA exercises are also available using CUDA Fortran). Some knowledge of parallel/threaded programming models would be useful. Access to a GPU machine will be supplied.

Note: this course will not address machine learning or any machine learning frameworks.


Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.

They are also required to abide by the ARCHER2 Training Code of Conduct.


(Wednesday is a rest day.)

Detailed timetable to follow

Course materials


Day 1

Part 1

Part 2

Part 3

Day 2

Part 1

Part 2


This course is part-funded by the PRACE PRACE project and is free to all.