HPE

The Centre of Excellence (CoE) forms part of the ARCHER2 service and has a mission to support the service and engage with the ARCHER2 community so they can take full advantage of the HPE Cray EX Supercomputer system. In the early years, most effort concentrated on system bring-up, training and resolving issues. Recently, we moved on to more project-based work aimed at benefiting users, or groups of users, as a whole.

The centre is comprised of two members of staff located within EPCC’s offices in Edinburgh, with three more working remotely, and collectively having many years of experience with HPC applications, software and hardware. These staff are augmented by other HPE staff who can be engaged on projects when appropriate. CoE staff are part of either the HPE HPC/AI CoEs group or the HPE HPC/AI EMEA Research Lab (ERL). The EMEA CoE group has experts in application tuning and optimisation for HPC and AI applications and is involved in CoEs at specific sites and other bespoke engagements. The HPC/AI EMEA Research Lab is engaged in various forms of customer collaboration. It’s research interests include high-performance data analytics, I/O and memory hierarchy, data-centric analysis and optimisation, large-scale energy distribution system optimisation, application infrastructure software engineering and experimental system design. The ERL is also involved in country-level and Horizon Europe research projects and with European training network programmes. Further information can be found at the ERL site. Please get in touch if you wish to discuss a project that perhaps falls outside the focus of the ARCHER2 CoE.

Examples of some CoE activities are noted below:

  • Deep CSE and Application Support
    Investigation of deep/complex issues in the software stack in addition to assisting the CSE Analysts with application migration and optimisation issues. This includes problem triage, root cause analysis and bug submission and liaison with HPE and AMD experts.
  • Training and Education
    Provision of training courses for ARCHER2, support user forums and webinars on various topics.
  • Community Engagement and Knowledge Sharing
    Engagement with the wider user community in the UK and support of the eCSE programme
  • Future Systems Evaluation
    For example help with the introduction of the ARCHER2 AMD GPU development platform and provision of initial training.
  • Research Software Development
    Development of new monitoring software is underway to understand application usage of shared resources (initially I/O) and work with EPCC to support investigations into energy reduction on ARCHER2.
  • HEC Consortia Support
    Recently the CoE started to offer extra support for the HEC community on ARCHER2 and we started a new activity in this area. We can help with optimising workflows, scaling up applications and can assist with porting to new architectures.
Access to significant experience is available via the CoE ranging from expertise in particular application areas to access to insights on new technology directions, we can be reached via the ARCHER2 Service Desk.