The reports below highlight some of the research carried out on ARCHER and ARCHER2 by the ARCHER2 CSE Team.

Measured power draw of ARCHER2 compute cabinets

Emissions and energy efficiency on large-scale high performance computing facilities: ARCHER2 UK national supercomputing service case study

Large supercomputing facilities are critical to research in many areas that impact on decisions such as how to address the current climate emergency. For example, climate modelling, renewable energy facility design and new battery technologies. However, these systems themselves are a source of large amounts of emissions due to the embodied emissions associated with their construction, transport, and decommissioning; and the power consumption associated with running the facility. Recently, the UK National Supercomputing Service, ARCHER2, has been analysing the impact of the facility in terms of energy and emissions. Based on this work, we have made changes to the operation of the service that give a cumulative saving of more than 20% in power draw of the computational resources with all application benchmarks showing reduced power to solution. In this paper, we describe our analysis and the changes made to the operation of the service to improve its energy efficiency, and thereby reduce its climate impacts.

Read the report

DoI: 10.48550/arXiv.2309.05440

Changes in power draw over the lifetime of the service are due to changes that have been made to improve the energy efficiency and reduce the power draw

ARCHER2 Net Zero Case Study

ARCHER2 is the UK’s National HPC service, operated by EPCC, the University of Edinburgh, at the Advanced Computing Facility. ARCHER2 provides an invaluable resource to UK academics to deliver world-class research, including research into the impact of climate change. As part of the wider DRI scoping project, this case study has investigated the emissions associated with ARCHER2 and makes a series of recommendations to move towards Net Zero for large-scale facilities of this type.
A detailed analysis of ARCHER2 including its overall energy use, the type of electricity used and how this will impact on the carbon footprint of ARCHER2 over service lifetime, has been conducted. Also included are investigations into the embedded energy, (e.g. from PAIA or empirical analysis based on area of printed circuits).

Read the report

DoI: 10.5281/zenodo.7788498

Mean node power distribution weighted by usage in nodeh broken down by software.

HPC-JEEP: Energy-based charging on the ARCHER2 HPC service

This report presents an analysis of the potential impacts introducing energy-based charging on the UK national supercomputing service: ARCHER2 (the UK national supercomputing service). The work was done as part of the HPC-JEEP project funded as part of the UKRI Net Zero DRI Scoping Project

Read the report

DoI: 10.5281/zenodo.7708634

Disseminating energy usage

HPC-JEEP: Disseminating energy usage on the ARCHER2 and COSMA HPC services

This report presents an overview of approaches to disseminating energy use on HPC systems to users of the services on COSMA (part of the DiRAC UK national HPC service and on ARCHER2 (the UK national supercomputing service). The work was done as part of the HPC-JEEP project funded as part of the UKRI Net Zero DRI Scoping Project.

Read the report

DoI: 10.5281/zenodo.7797311

An overview page for a number of ARCHER2 graphs on Grafana

Automated service monitoring in the deployment of ARCHER2

The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.

Read the report

DoI: 10.48550/arXiv.2303.11731

AMD MI250X GPU

Emerging technologies: an AMD MI250X GPU-based platform for HPC applications

Supporting AMD MI250X GPU-based platforms for HPC.

At EPCC, we have experience of providing support to users of NVIDIA V100/A100 GPUs that exist on various Tier-2 HPC systems. However, until now, we have had little knowledge of how to support the use of AMD GPUs, specifically, the AMD Instinct MI200 series of accelerators.

We felt it was necessary to address this shortcoming given that our Tier-1 HPC system, ARCHER2, is built on AMD processors. Furthermore, the top Cray EX supercomputers available today (Fontier and El Capitan) feature AMD acclerator technology. And so, after gaining access to suitable hardware, we conducted an evaluation exercise in order to assess how easily EPCC support staff could, when dealing with AMD GPUs, undertake typical HPC support activities, such as training, code compilation/debugging and code profiling/benchmarking.

This details of this evaluation can be found in the attached report.

Read the report

DoI: 10.5281/zenodo.7752810

Parallel IO on ARCHER2 disk image

Parallel IO on ARCHER2

File input and output can become bottlenecks for parallel programs running on large numbers of processors. In this work we investigate how to increase the MPI-IO performance on ARCHER2, the UK National HPC service, using the Cray Lustre lockahead options which have not been previously investigated on this system. We then investigate the performance of ADIOS2, a more modern parallel IO library, and look at the performance of IO from a real HPC application rather than a synthetic benchmark.

Read the report

DoI: 10.5281/zenodo.7643563

Rust in HPC Logo

Emerging Technologies: Rust in HPC

This technical report is a short investigation into how Rust could be used for a scientific application in a HPC system. A computational fluid dynamics model of fluid flow into and out of a box was developed in Rust and compared to the same algorithm implemented in C and Fortran. These simulations were performed for both serial and parallelised versions over a variety of problem sizes. The report discusses the results of these simulations and the implications for using Rust as a tool for scientific programming in HPC.

Read the report

DoI: 10.5281/zenodo.7620406

Hybrid Tasks Logo

Hybrid parallel programming with tasks

This technical report is an introduction to using a hybrid parallel programming model that combines MPI with OmpSs or OpenMP dependent tasks. This model allows both computation and communication to be expressed using a coarse-grained dataflow approach, which helps to remove most of the unnecessary ordering constraints and intranode synchronisation imposed by the more conventional approach of MPI with OpenMP parallel loops. The report describes the model, and how it is supported by an augmented MPI library which interoperates with the tasking runtimes. It also assesses some of the advantages and disadvantages of this style of parallel programming. 

Read the report

DoI: 10.5281/zenodo.7524540

HPC JEEP Job Efficiency and Energy Usage

HPC-JEEP: Energy Usage on ARCHER2 and the DiRAC COSMA HPC services

This report presents an analysis of the energy use by software and research communities on two large UK national HPC services: ARCHER2 (the UK national supercomputing service) and DiRAC COSMA (part of the DiRAC national HPC service). The work was done as part of the HPC-JEEP project funded as part of the UKRI Net Zero DRI Scoping Project.
Image credit : Net Zero Digital Research Infrastructure Scoping project

Read the report

DoI: 10.5281/zenodo.7137390