ARCHER2 Weekly Newsletter


ARCHER2 maintenance

There will be a full ARCHER2 maintenance on Wednesday 8th May from 0900 - 2100. This is to allow an essential system update to take place.

During this maintenance, users will not be able to connect to the login nodes, jobs will not run and users will be unable to access data.

We will notify users once ARCHER2 is returned to service.

HPC Summer School

Apply for the High Performance Computing Summer School at EPCC in Edinburgh! This 2-week long programme, taking place at EPCC in Edinburgh University, Scotland from 22nd June, will give around a dozen undergraduate students at UK universities the opportunity to learn and practise HPC technologies. Travel, accommodation and subsistence will all be covered.

More details and to apply

Parallel Performance Analysis using Scalasca/Score-P on ARCHER2 (for CPUs and HIP on the AMD GPUs).

Edinburgh, 29 - 30 April 2024 09:30 - 17:00 GMT

Scalasca/Score-P is a portable, free and open-source software toolset that supports the performance optimisation of parallel programs by measuring and analysing their runtime behaviour. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronisation – and offers guidance in exploring their causes. Scalasca uses execution profiles and traces generated by the community-developed Score-P instrumentation and measurement infrastructure.

The tool has been specifically designed for use on large-scale systems, but is also well suited for small and medium-scale HPC platforms. The software is available for free download under the New BSD open-source license.

Scalasca/Score-P targets scientific and engineering applications based on the programming interfaces MPI, CUDA, HIP and OpenMP/OpenACC, including hybrid applications combining both with kernel offload to GPU accelerators. Note that for the AMD GPUs on ARCHER2, only instrumentation of HIP is currently supported

This in-person course will cover how to use the tools in practice, delivered by members of the development team. Scalasca/Score-P is portable across HPC systems, but for this course practical exercises will be conducted on the UK National HPC Service ARCHER2 (an HPE/Cray EX system) including access to the AMD GPU nodes for profiling of codes using HIP; all attendees will be given accounts on ARCHER2 for the duration of the course. Although example parallel programs will be provided, attendees are encouraged to analyse the performance of their own applications.

Access to ARCHER2 will be available before the course starts to port and build applications; those who are unfamiliar with ARCHER2-GPU programming are encouraged to attend or view the recordings of recent ARCHER2 GPU online training courses including “Introduction to GPU programming with HIP”.

Further details and registration

2024 Educational Award For Outstanding Contribution to Computational Science Education

The ACM SIGHPC Education Chapter is seeking nominations for candidates for the 2024 Educational Award For Outstanding Contribution to Computational Science Education. We are seeking candidates who have led projects or programs that have made significant contributions to computational science education defined broadly to include all disciplines and all education levels.

The award will be presented at SC24. The recipient will receive a $2,000 cash award and travel support to attend the SC24 conference. Nominations will include a statement endorsing the nominee, and up to three letters of endorsement. Applications are due by Friday June 28, 2024, by end of day anywhere on earth. The chapter will choose up to one award winner and up to two honorable mentions.

More details, including the application forms and instructions

Questions concerning award eligibility and nominations.

Recently added known issues

The “Known Issues” page of the ARCHER2 Documentation https://docs.archer2.ac.uk/known-issues/ lists all current open known issues including a description of the issue, its symptoms and any work-arounds.

  • When close to storage quota, jobs may slow down or produce corrupted files (Added: 2024-02-27) For situations where users are close to user or project quotas on work (Lustre) file systems we have seen cases of the following behaviour:
    • Jobs run very slowly as IO slows down
    • IO calls seem to complete successfully but not all data is written (so output is corrupted)
    • No “disk quota exceeded” error is seen

If you see these symptoms: slower than expected performance, data corruption; then you should check if you are close to your storage quota (either user or project quota). If you are, you may be experiencing this issue. Either remove data to free up space or request more storage quota.

Upcoming ARCHER2 Training

Further details of upcoming training

We always welcome researchers wishing to present their work in a webinar - please contact the Service Desk if you would be interested in presenting your work.

Twitter

Recordings of past courses

Recordings of past virtual tutorials