Service Desk Operators' blog


The Service Desk is often the first point of contact between the Service team and the outside world. We asked a couple of recent recruits to the Service Desk team to share their experience and insights into this very important role.

There are two shifts per day 8am-1pm and 1pm-6pm, Monday to Friday, so there is always an operator ready to handle issues during working hours. The operator will work closely with the shift user administrator (Uadmin) who deals with user administration tasks such as project allocations and SAFE queries. Operator will also work closely with the in-depth query team, a group of experienced HPC users who can advise on technical problems and performance issues.

The operator must triage the incoming queries and pass them along to the correct team members, and also respond to any straightforward questions that can be dealt with by the operator themselves. The operators are usually members of the Applications group at EPCC and many also work in the in-depth query group, meaning the operators have experience of who is best placed to deal with different types of technical queries.

How do we handle queries?

Every query that arrives is flagged according to the project that it relates to and the machine (as the same Service Desk covers ARCHER2, Cirrus, EIDF and some other services as well). The type of query is set, depending on whether it is a technical question, relates to resources and allocations, applications for access, or more general advice. The handling team is assigned the query, though this can change as the query is progressed, perhaps being passed to a different team if more support is needed, or other teams can be brought in to address different aspects of the question.

The person sending in the query is kept informed of its progress, from the initial acknowledgement that lets them know that it has been received, to who is working on it and how long we expect it to take to resolve, right through to the final hand off when we believe it to have been completed and invite the user to give us feedback on their experience.

How to write a good query?

There can be many reasons a query is raised and while providing detailed but succinct information is always important, doing so when raising complex technical queries is doubly important. Here are a few recommendations and guidelines to follow to speed up the query handling process that will keep your friendly neighbourhood operator happy:

General queries:

  1. Information such as the ID of the project you are working on, your machine username and the name of the machine where you experienced the issue helps us quickly set up the ticket.

  2. Providing the exact error message you have encountered and the steps to reproduce it is extremely helpful as it allows our team to recreate what may have gone wrong.

  3. Let us know what the issue is stopping you from doing, so that we have context for the issue and any steps you have already tried to resolve the problem.

  4. If you have looked for the answer in our documentation and been unable to find it, been unable to follow the instructions, or found the instructions didn’t work, do please let us know so that we can improve the docs as well as hopefully providing you with a correct and comprehensible answer.

  5. Let us know what you would like the outcome of the query to be e.g. if you want more machine time or disk space, let us know how much, and for how long.

Technical queries:

  1. If you have had an issue with a job running on the machine, the Job ID and the time it started and ended is invaluable information as it allows us to pinpoint when the issue occurred. This allows us to correlate the issue with other activity on the machine.

  2. Please provide copies of your runscripts, input files, output files, any logs, a list of modules you have loaded and the full error message. With all of these, we are able to attempt to reproduce the issue and confirm the problem. Copying the files to a directory in ‘/work/YourProjectID/shared’, and providing a link to this location in the query, allows members of the in-depth query team to access the information they need directly on the system and share updated files or code with you in a collaborative way. This can greatly reduce the time it takes to solve a query.

  3. Always include the name of the code and the links to source control repositories that we can use to copy code to test things ourselves. If we can research the code and install it independently, it speeds up our debugging time. We support a wide variety of codes, but we can’t all be experts in all codes. So, providing us with links to source code lets us gain an understanding of the code before suggesting solutions. This also helps the operators identify repeat issues and trends.

  4. Let us know if you have been able to reproduce the issue in a minimal example e.g. does your MPI code issue persist with a single MPI process?

  5. Provide us with the relevant context so we can better understand the reason you are hitting the problem. If we don’t have the full picture, we may suggest a solution to one part of the problem but not address the bigger issue. Try to briefly summarise what you’re trying to achieve, and how you are going about it. Are other ways you have gone about this that have also hit issues?

  6. Let us know if this is an issue only on ARCHER2 - if you know if the code works on another platform, it can help narrow down the issue to a specific machine or environment setting.

  7. There are many ways we can liaise with you to solve the issue. We offer call backs via a Teams meeting to discuss. You can request this when you send in your query. This doesn’t mean that the issue will necessarily be resolved at the meeting, but it can significantly reduce the iterations required to make progress. Sometimes an actual chat, or sharing your screen, can reduce the time to solution.

Summary

This isn’t an exhaustive summary of all the details of an operator’s role in supporting the ARCHER2 service, or how to write a helpful query, but we hope it may give you a bit more insight into who is at the other end of the ARCHER2 support email inbox and the type of information that will help us to help you as effectively as possible.