Business continuity management for ARCHER2


If you run a data centre building full of expensive kit used to run a variety of services including ARCHER2, how do you minimise the chances of something negatively impacting the services you run, and how do you decide what to work on first if something does happen that impacts one or more of those services?

Where do you put the resources and preventative measures to provide the maximum protection and how do you test to make sure these are working? How do you prioritise work after such an event? We are looking at these sorts of question as we head towards our first ISO 22301, business continuity management audit.

Our datacentre is key to the successful delivery of ARCHER2 as the equipment and infrastructure is housed there. If something major happened to the datacentre we may well not be able to deliver the ARCHER2 service. If, for example, there is a small fire in our datacentre, we need to make sure there are sufficient measures in place for staff safety, to detect the fire, to put it out or contain it, and to minimise damage. If we call out the fire brigade we need to make sure they know that we have very high voltage electricity in the building and that it wouldn’t be safe to pump thousands of gallons of water through it until the power was off.

image

With the power off we still need to be able to get back into the building once the fire is out even if the electronic door system is out. We need to know which infrastructure is key to the delivery of each service. We need to know what the priorities are and have play books developed and tested so that the work goes smoothly and to minimise user disruption.

image

Looking at ARCHER2, we run a number of service elements which have very different time requirements for bringing them back into service. For example we provide the ARCHER2 service desk to help our users and to communicate with them, and this has tight targets for responses that we need to meet. We also run training courses with a number of days of training spread over a year. We deliver outreach events. We carry out maintenance on the ARCHER2 equipment. We are currently carrying out a business impact analysis to identify the priorities for the different service elements to help us plan the order and time constraints for resumption of service.

We have our external ISO 22301 audit in the autumn of 2022, so watch this space for further updates.

Project Manager
EPCC
University of Edinburgh