- Current System Load - Full System
- Service Alerts
- Maintenance Sessions
- System Status Mailings
- FAQ
- Usage statistics
Current System Load - CPU
The plot below shows the status of nodes on the current ARCHER2 Full System service. A description of each of the status types is provided below the plot.
- alloc: Nodes running user jobs
- idle: Nodes available for user jobs
- resv: Nodes in reservation and not available for standard user jobs
- plnd: Nodes are planned to be used for a future jobs. If pending jobs can fit in the space before the future job is due to start they can run on these nodes (often referred to as backfilling).
- down, drain, maint, drng, comp, boot: Nodes unavailable for user jobs
- mix: Nodes in multiple states
Note: the long running reservation visible in the plot corresponds to the short QoS which is used to support small, short jobs with fast turnaround time.
Current System Load - GPU
- alloc: Nodes running user jobs
- idle: Nodes available for user jobs
- resv: Nodes in reservation and not available for standard user jobs
- plnd: Nodes are planned to be used for a future jobs. If pending jobs can fit in the space before the future job is due to start they can run on these nodes (often referred to as backfilling).
- down, drain, maint, drng, comp, boot: Nodes unavailable for user jobs
- mix: Nodes in multiple states
Service Alerts
The ARCHER2 documentation also covers some Known Issues which users may encounter when using the system.
Status | Type | Start | End | Scope | User Impact | Reason |
---|---|---|---|---|---|---|
At-Risk | Service Alert | 2024-04-25 14:00 | 2024-04-25 16:00 | Connectivity to ARCHER2 may have a short outage but no impact is expected | We do not expect any user impact but if there is an issue it will be a short connectivity outage | Changing power supply for the JANET CIENA unit |
Ongoing | Service Alert | 2024-04-24 08:00 | Emails from Service Desk | We believe that emails being sent from the ARCHER2 Service Desk are being delayed downstream, causing them not to be received promptly. We are working to resolve. | ||
Ongoing | Service Alert | 2024-04-23 12:00 | ARCHER2 work (fs3) file system |
Slow response when accessing data on the file system. Update 24th April: We are continuing to investigate and our on-site HPE support team have escalated the issue. Darshan IO monitoring has been enabled for all jobs to help identify the issue. |
Extreme load on metadata server. |
Previous Service Alerts
This section lists resolved service alerts from the past 30 days. A full list of historical resolved service alerts is available.
Status | Type | Start | End | Scope | User Impact | Reason |
---|---|---|---|---|---|---|
Resolved | Service Alert | 2024-04-22 08:00 | 2024-04-22 12:00 | Connectivity to ARCHER2 may have a short outage but no impact is expected | We do not expect any user impact but if there is an issue it will be a short connectivity outage | Changing power supply for the JANET CIENA unit |
Resolved | Service Alert | 2024-04-15 14:00 | 2024-04-15 16:00 | ARCHER2 rundeck ticketing server | May be a delay in processing new user requests via SAFE | Physical moving of the server hosting the rundeck ticketing system |
Resolved | Service Alert | 2024-04-15 15:30 | 2024-04-15 16:40 | ARCHER2 login node | Users cannot currently connect to ARCHER2 | Physical moving of the server hosting the ARCHER2 ldap server |
Resolved | Service Alert | 2024-04-15 10:00 | 2024-04-15 10:30 | Outage to DNS server which will impact ARCHER2 and ARCHER2 SAFE | Users can still connect to service but may be unable to access external websites (eg GitLab) | Migration of server in preparation of the wider power work affecting site the following week |
Resolved | Service Alert | 2024-04-11 10:00 | 2024-04-11 10:40 | ARCHER2 rundeck ticketing server | May be a delay in processing new user requests via SAFE | Migration of the rundeck ticketing system |
Resolved | Service Alert | 2024-04-09 10:00 | 2024-04-09 11:00 | ARCHER2 slurm scheduler |
ARCHER2 Slurm Controller will be restarted this morning. Running jobs will continue to run, but Slurm commands will be unavailable for a few minutes. |
Adjustment of a scheduling parameter |
Resolved | Service Alert | 2024-04-06 21:37 | 2024-04-06 23:45 | ARCHER2 work4 (fs4) file system | Partial loss of access to work4 (fs4) for a short while | HPE Support are investigating root cause. |
Resolved | Service Alert | 2024-03-27 11:45 | 2024-03-28 11:45 | All parallel jobs launched using srun | All parallel jobs launched using `srun` will have their IO profile captured by the Darshan IO profiling tool. In rare cases this may cause jobs to fail or impact performance. Users can disable Darshan by adding the line `module remove darshan` before they use `srun` in their job submission scripts. | Capturing data on the IO use on ARCHER2 to improve the service. |
Maintenance Sessions
This section lists recent and upcoming maintenance sessions. A full list of past maintenance sessions is available.
Status | Type | Start | End | Scope | User Impact | Reason |
---|---|---|---|---|---|---|
Planned | Full | 2024-05-08 09:00 | 2024-05-08 21:00 | Full ARCHER2 System | Users will not be able to connect to the login nodes, jobs will not run and users will be unable to access data during this maintenance | Replacement of operating system certificates |
Completed | Partial | TBC | ARCHER2 | Running jobs will continue but users will not be able to submit new jobs. Users will be notified when job submission is available again. | Integrating the GPU nodes into ARCHER2 |
System Status mailings
If you would like to receive email notifications about system issues and outages, please subscribe to the System Status Notifications mailing list via SAFE
FAQ
Usage statistics
This section contains data on ARCHER2 usage for Mar 2024. Access to historical usage data is available at the end of the section.
Usage by job size and length
Queue length data
The colour indicates scheduling coefficient which is computed as [run time] divided by [run time + queue time]. A scheduling coefficient of 1 indicates that there was zero time queuing, a scheduling coefficient of 0.5 means that the job spent as long queuing as it did running.
Software usage data
Plot and table of % use and job step size statistics for different software on ARCHER2 for Mar 2024. This data is also available as a CSV file.
This table shows job step size statistics in cores weighted by usage, total number of job steps and percent usage broken down by different software for Mar 2024.
Software | Min | Q1 | Median | Q3 | Max | Jobs | Nodeh | PercentUse | Users | Projects |
---|---|---|---|---|---|---|---|---|---|---|
Overall | 1 | 512.0 | 1152.0 | 4096.0 | 131072 | 3305883 | 2636925.9 | 100.0 | 860 | 123 |
VASP | 8 | 512.0 | 1024.0 | 1536.0 | 65536 | 199424 | 545132.8 | 20.7 | 141 | 13 |
Unknown | 1 | 608.0 | 1200.0 | 4096.0 | 82944 | 2364813 | 432989.6 | 16.4 | 398 | 92 |
Met Office UM | 2 | 1024.0 | 1152.0 | 6400.0 | 27556 | 34152 | 273761.1 | 10.4 | 53 | 2 |
LAMMPS | 1 | 512.0 | 2048.0 | 38400.0 | 131072 | 9769 | 210171.0 | 8.0 | 51 | 21 |
CP2K | 1 | 128.0 | 256.0 | 1024.0 | 8192 | 58384 | 160689.8 | 6.1 | 53 | 10 |
GROMACS | 1 | 640.0 | 1280.0 | 3840.0 | 9600 | 12087 | 144394.7 | 5.5 | 48 | 11 |
OpenFOAM | 1 | 1024.0 | 2048.0 | 5120.0 | 12800 | 2584 | 98252.5 | 3.7 | 45 | 14 |
CASTEP | 1 | 192.0 | 512.0 | 1024.0 | 4096 | 311969 | 86162.1 | 3.3 | 53 | 10 |
Python | 1 | 1024.0 | 4096.0 | 16384.0 | 16384 | 51278 | 76511.1 | 2.9 | 70 | 26 |
SENGA | 1 | 13560.0 | 37500.0 | 37500.0 | 37500 | 100 | 48693.1 | 1.8 | 5 | 3 |
Nektar++ | 16 | 4096.0 | 5120.0 | 10240.0 | 12800 | 528 | 47629.9 | 1.8 | 9 | 3 |
ONETEP | 1 | 128.0 | 512.0 | 2048.0 | 2048 | 1880 | 43623.9 | 1.7 | 8 | 2 |
ChemShell | 1 | 512.0 | 1024.0 | 6912.0 | 38400 | 993 | 43029.1 | 1.6 | 13 | 4 |
Xcompact3d | 128 | 4096.0 | 8192.0 | 16384.0 | 32768 | 324 | 40674.7 | 1.5 | 9 | 5 |
FHI aims | 8 | 256.0 | 512.0 | 1024.0 | 4096 | 74096 | 35298.5 | 1.3 | 17 | 3 |
MITgcm | 1 | 200.0 | 363.0 | 615.0 | 1024 | 15013 | 33927.7 | 1.3 | 20 | 2 |
NWChem | 1 | 256.0 | 640.0 | 1024.0 | 1280 | 59972 | 27830.2 | 1.1 | 15 | 5 |
iIMB | 256 | 3200.0 | 6400.0 | 6400.0 | 6400 | 57 | 25740.2 | 1.0 | 2 | 2 |
NEMO | 1 | 1568.0 | 7232.0 | 32768.0 | 65536 | 15466 | 23769.3 | 0.9 | 16 | 3 |
Code_Saturne | 8 | 4096.0 | 5376.0 | 6144.0 | 131072 | 337 | 22133.0 | 0.8 | 7 | 3 |
BOUT++ | 768 | 768.0 | 1344.0 | 1344.0 | 1344 | 128 | 21806.5 | 0.8 | 1 | 1 |
Quantum Espresso | 1 | 432.0 | 896.0 | 1024.0 | 4096 | 61440 | 21586.0 | 0.8 | 18 | 5 |
Nek5000 | 768 | 25600.0 | 25600.0 | 25600.0 | 25600 | 47 | 20001.4 | 0.8 | 2 | 2 |
EPOCH | 128 | 5120.0 | 5120.0 | 5120.0 | 8192 | 124 | 19073.6 | 0.7 | 4 | 1 |
Hydro3D | 4 | 2100.0 | 2500.0 | 2500.0 | 25600 | 195 | 18117.3 | 0.7 | 5 | 3 |
CASINO | 128 | 1024.0 | 2048.0 | 2560.0 | 4096 | 115 | 16719.3 | 0.6 | 2 | 2 |
SU2 | 10 | 512.0 | 640.0 | 2560.0 | 3840 | 1475 | 11910.7 | 0.5 | 6 | 2 |
CRYSTAL | 1 | 128.0 | 128.0 | 512.0 | 4096 | 21598 | 10433.1 | 0.4 | 6 | 3 |
3DNS | 8800 | 8800.0 | 15960.0 | 17680.0 | 50217 | 10 | 8861.7 | 0.3 | 2 | 1 |
CESM | 1 | 512.0 | 1280.0 | 4096.0 | 4096 | 3793 | 7923.1 | 0.3 | 8 | 1 |
a.out | 1 | 512.0 | 20480.0 | 20480.0 | 20480 | 169 | 6135.4 | 0.2 | 12 | 10 |
NAMD | 64 | 64.0 | 512.0 | 512.0 | 512 | 702 | 6128.1 | 0.2 | 5 | 4 |
VAMPIRE | 8 | 256.0 | 1024.0 | 2048.0 | 65536 | 491 | 5723.3 | 0.2 | 9 | 3 |
GENE | 36 | 2048.0 | 2304.0 | 4096.0 | 9216 | 239 | 5621.5 | 0.2 | 3 | 2 |
OSIRIS | 12288 | 12288.0 | 12288.0 | 24576.0 | 24576 | 28 | 5363.0 | 0.2 | 1 | 1 |
SBLI | 128 | 4096.0 | 65536.0 | 65536.0 | 65536 | 76 | 4621.1 | 0.2 | 2 | 1 |
ptau3d | 8 | 240.0 | 240.0 | 240.0 | 1024 | 236 | 4425.3 | 0.2 | 3 | 2 |
TPLS | 128 | 1024.0 | 2048.0 | 4096.0 | 8192 | 68 | 3780.4 | 0.1 | 2 | 1 |
WRF | 16 | 384.0 | 384.0 | 384.0 | 384 | 144 | 3226.1 | 0.1 | 4 | 3 |
FEniCS | 128 | 131072.0 | 131072.0 | 131072.0 | 131072 | 43 | 2421.7 | 0.1 | 1 | 1 |
Smilei | 1 | 256.0 | 256.0 | 4096.0 | 4096 | 238 | 2182.8 | 0.1 | 4 | 1 |
DL_POLY | 1 | 256.0 | 2560.0 | 2560.0 | 4096 | 231 | 2159.1 | 0.1 | 3 | 3 |
HYDRA | 1 | 6400.0 | 12800.0 | 12800.0 | 12800 | 131 | 1879.9 | 0.1 | 6 | 3 |
RMT | 128 | 640.0 | 896.0 | 896.0 | 896 | 98 | 1877.3 | 0.1 | 2 | 1 |
FVCOM | 640 | 640.0 | 640.0 | 640.0 | 640 | 15 | 1438.3 | 0.1 | 1 | 1 |
OpenSBLI | 384 | 131072.0 | 131072.0 | 131072.0 | 131072 | 17 | 1231.5 | 0.0 | 2 | 2 |
SIESTA | 4 | 32.0 | 2048.0 | 5120.0 | 6656 | 82 | 1063.2 | 0.0 | 3 | 1 |
Elk | 32 | 256.0 | 384.0 | 512.0 | 512 | 49 | 330.9 | 0.0 | 2 | 2 |
PeleLMeX | 16384 | 16384.0 | 32768.0 | 32768.0 | 32768 | 2 | 152.6 | 0.0 | 1 | 1 |
Amber | 128 | 1152.0 | 1536.0 | 1792.0 | 2048 | 64 | 123.9 | 0.0 | 1 | 1 |
PRECISE | 8 | 1536.0 | 2048.0 | 2048.0 | 2560 | 33 | 63.6 | 0.0 | 1 | 1 |
Arm Forge | 1 | 512.0 | 1024.0 | 1024.0 | 2048 | 246 | 49.9 | 0.0 | 14 | 8 |
DL_MESO | 64 | 64.0 | 64.0 | 64.0 | 64 | 13 | 32.1 | 0.0 | 1 | 1 |
EDAMAME | 64 | 64.0 | 64.0 | 64.0 | 64 | 31 | 27.6 | 0.0 | 1 | 1 |
PDNS3D | 1024 | 1024.0 | 1024.0 | 1024.0 | 1024 | 3 | 14.5 | 0.0 | 1 | 1 |
AxiSEM3D | 4 | 128.0 | 128.0 | 128.0 | 128 | 26 | 4.3 | 0.0 | 1 | 1 |
HemeLB | 1 | 32.0 | 256.0 | 256.0 | 256 | 45 | 1.1 | 0.0 | 2 | 2 |
SPARTA | 8192 | 8192.0 | 8192.0 | 8192.0 | 8192 | 1 | 0.3 | 0.0 | 1 | 1 |
GS2 | 2 | 2.0 | 2.0 | 2.0 | 2 | 201 | 0.1 | 0.0 | 1 | 1 |
GPAW | 128 | 128.0 | 128.0 | 128.0 | 128 | 9 | 0.0 | 0.0 | 1 | 1 |
ludwig | 1 | 1.0 | 1.0 | 1.0 | 1 | 1 | 0.0 | 0.0 | 1 | 1 |