This presentation will combine four lightning-talks, each one relating to some aspect of code performance on HPC platforms.

Running Python code at scale

Running Python code at scale (e.g., > 10,000 MPI ranks) can overload the file system metadata server if every process is working through the same sequence of Python imports. The Spindle tool is intended to mitigate this issue by caching data within node memory. We demonstrate how to use Spindle on ARCHER2 and on Cirrus.

Realistic ML benchmarks

Realistic ML benchmarks are a useful means for checking the performance of complex HPC software stacks that encompass Python, MPI, and GPU support. We show how to use the MLPerf CosmoFlow benchmark on ARCHER2 using a centrally-installed TensorFlow (Lmod) module.

Roofline Models with Intel Advisor

The roofline plot can be a succinct way to identify underperforming elements within your code. Further, such a plot shows whether a particular element (e.g., subroutine, function or loop) is under the memory-bound or compute-bound rooflines, thereby indicating how a code element’s design could be altered for improved performance. We present, for Cirrus and ARCHER2, the use of the Intel Advisor tool for generating roofline plots.

Fortran-based OpenMP offloading

Lastly, we present a worked example of Fortran-based OpenMP offloading for the Cirrus machine. It is likely that the next generation of HPC machines will feature GPU hardware, so, we hope this example will be useful for any users who are thinking about accelerating their code in this way. OpenMP offloading has the advantage of not being tied to a particular GPU vendor.

This online session is open to all. It will use the Blackboard Collaborate platform.

Video