A Container Factory for HPC


This blog post follows on from “HPC Containers?”, which showed how to run a containerized GROMACS application across multiple compute nodes. The purpose of this post is to explain how that container was made.

We turn now to the container factory, the environment within which containers are first created and then customised for various HPC platforms. The container factory is a standalone machine providing root-level access. It runs on the Eleanor Research Cloud at the University of Edinburgh. (The factory instance has 8 vCPUs, 16 GB RAM and 160 GB of disk space.)

At the time of writing, the factory OS is Ubuntu 20.04.2 and the container software is (Sylabs) SingularityCE 3.8.1.

You could of course setup a container factory on your personal laptop (or on any machine where you have root access). However, establishing the factory as a cloud-based instance separates that work from any details peculiar to an individual’s machine, and the building of a factory can also be scripted allowing others to create their own container factories, see https://github.com/mbareford/container-factory.

The two sections that follow contain many script blocks; these have been added as collapsible items so as to improve readability. At the start of each block is the corresponding path from the Container Factory repository. All subsidiary scripts referenced can also be found within this GitHub project.

Creating the Initial Container Image

The making of an application-specific container takes place within the factory builds folder, e.g., ~/work/builds/gromacs. Presented below is the top-level creation script for the GROMACS code.

GROMACS Creation Script
  # https://github.com/mbareford/container-factory/blob/main/builds/gromacs/create.sh

  #!/bin/bash

  echo "Deleting old images, logs and scripts..."
  rm -f *.sif*
  rm -f *.log
  rm -rf scripts*

  echo "Gathering required scripts..."
  APP=gromacs
  SCRIPTS_ROOT=${HOME}/work/scripts
  SCRIPTS_DEF=${SCRIPTS_ROOT}/def
  SCRIPTS_APP=${SCRIPTS_ROOT}/app/${APP}
  SCRIPTS_SNG=${SCRIPTS_ROOT}/fac/singularity

  mkdir -p ./scripts/aux
  cp ${SCRIPTS_ROOT}/aux/download_src.sh ./scripts/aux/
  cp ${SCRIPTS_ROOT}/aux/install_cmp.sh ./scripts/aux/
  cp ${SCRIPTS_ROOT}/aux/setup_env.sh ./scripts/aux/
  cp ${SCRIPTS_ROOT}/aux/update_env.sh ./scripts/aux/
  cp ${SCRIPTS_ROOT}/aux/add_log.sh ./scripts/aux/
  cp ${SCRIPTS_ROOT}/aux/add_dirs.sh ./scripts/aux/

  mkdir -p ./scripts/chk
  cp ${SCRIPTS_ROOT}/chk/check_os.sh ./scripts/chk/
  cp ${SCRIPTS_ROOT}/chk/check_gcc.sh ./scripts/chk/
  cp ${SCRIPTS_ROOT}/chk/check_cmp.sh ./scripts/chk/

  mkdir -p ./scripts/def
  cp ${SCRIPTS_ROOT}/def/gromacs.def ./scripts/def/

  mkdir -p ./scripts/os
  cp ${SCRIPTS_ROOT}/os/ubuntu-20.04.sh ./scripts/os/

  mkdir -p ./scripts/cmp
  cp -r ${SCRIPTS_ROOT}/cmp/miniconda ./scripts/cmp/
  cp ${SCRIPTS_ROOT}/cmp/cmake.sh ./scripts/cmp/

  mkdir -p ./scripts/app/${APP}
  cp ${SCRIPTS_APP}/source.sh ./scripts/app/${APP}/
  cp ${SCRIPTS_APP}/build.sh ./scripts/app/${APP}/
  cp -r ${SCRIPTS_APP}/host ./scripts/app/${APP}/

  tar -czf scripts.tar.gz ./scripts
  rm -rf scripts

  echo "Creating ${APP} singularity image file..."
  sudo singularity build ${PWD}/${APP}.sif.0 ${SCRIPTS_DEF}/${APP}.def &> create.log

  echo "Adding creation log to image file..."
  ${SCRIPTS_SNG}/add_log.sh ${PWD}/${APP}.sif.0 create log

  echo "Final tidy up..."
  rm create.log
  rm scripts.tar.gz

  echo "Creation complete!"
  echo "${PWD}/${APP}.sif.0"

The key line in the creation script is the one that builds the container image.

sudo singularity build ${PWD}/${APP}.sif.0 ${SCRIPTS_DEF}/${APP}.def &> create.log

It takes as input a Singularity definition file, e.g., ${HOME}/work/scripts/def/gromacs.def.

GROMACS Singularity Definition File
  # https://github.com/mbareford/container-factory/blob/main/scripts/def/gromacs.def

  Bootstrap: library
  From: ubuntu:20.04

  %setup
      # empty

  %files
      ${HOME}/work/scripts/post_start.sh /opt/
      ${HOME}/work/scripts/post_stop.sh /opt/
      ${HOME}/work/builds/gromacs/scripts.tar.gz /opt/

  %environment
      # empty

  %post
      . /opt/post_start.sh

      ubuntu-20.04.sh 10

      miniconda.sh 3 4.8.3 38
      conda_install.sh numpy,scipy,matplotlib

      cmake.sh 3.18.4

      source.sh gromacs 2021.1

      . /opt/post_stop.sh

  %runscript
      # empty

  %startscript
      # empty

  %test
      ROOT=/opt/scripts
      export PATH=${ROOT}/chk:${ROOT}/aux:${ROOT}/os:${ROOT}/cmp:${PATH}
      check_os.sh "Ubuntu 20.04.2"
      check_gcc.sh "10.3.0"
      check_cmp.sh ${MINICONDA3_ROOT} ${MINICONDA3_NAME}
      check_cmp.sh ${CMAKE_ROOT} ${CMAKE_NAME}

  %labels
      Author Michael Bareford
      Email m.bareford@epcc.ed.ac.uk
      Version v1.0.0

  %help
      This GROMACS (http://www.gromacs.org/) container image file was created at the EPCC Container Factory,
      an OpenStack Ubuntu 20.04 instance (ID 859596f3-6683-4951-82d4-f9e080c30d1f) hosted by the University of Edinburgh Eleanor Research Cloud.

      The container is based on Ubuntu 20.04 and features GCC 10.3.0, Miniconda3 4.8.3, CMake 3.18.4 and the GROMACS source code version 2021.1.
      See the container creation log at "/opt/logs/create.log.0" and the original definition file at "/opt/scripts/def/gromacs.def".

      Submission script templates can be found under "/opt/scripts/app/gromacs/host/".
      These script files are named "submit.sh" and are organised by "<host name>/<MPI library>/<compiler>".

The post section of the definition file specifies the container OS (Ubuntu 20.04 in this case) as well as the GCC compiler (major) version. Subsequent commands install Miniconda3, CMake 3.18.4 and the GROMACS 2021.1 source code. The installation of the GROMACS source is handled by a simple script called source.sh.

GROMACS Source Script
  # https://github.com/mbareford/container-factory/blob/main/scripts/app/gromacs/source.sh

  #!/bin/bash
  
  VERSION=$2
  LABEL=$1
  NAME=${LABEL}-${VERSION}
  ROOT=/opt/app/${LABEL}

  mkdir -p ${ROOT}
  cd ${ROOT}

  wget https://ftp.gromacs.org/${LABEL}/${NAME}.tar.gz
  tar -xzf ${NAME}.tar.gz
  rm ${NAME}.tar.gz

Information on the other sections listed in the definiton file (e.g., files, test and help) can be found in the SingularityCE User Guide.

The creation phase should result in an image file such as gromacs.sif.0 that can be inspected by running
singularity inspect -H gromacs.sif.0.

This GROMACS (http://www.gromacs.org/) container image file was created at the EPCC Container Factory,
an OpenStack Ubuntu 20.04 instance (ID 859596f3-6683-4951-82d4-f9e080c30d1f) hosted by the University of Edinburgh Eleanor Research Cloud.

The container is based on Ubuntu 20.04 and features GCC 10.3.0, Miniconda3 4.8.3, CMake 3.18.4 and the GROMACS source code version 2021.1.
See the container creation log at "/opt/logs/create.log.0" and the original definition file at "/opt/scripts/def/gromacs.def".

Submission script templates can be found under "/opt/scripts/app/gromacs/host/".
These script files are named "submit.sh" and are organised by "<host name>/<MPI library>/<compiler>".

You can see that the text returned was provided by the Singularity definition file. It is reproduced here to illustrate the principle that the provenance of a container should always be accessible via the Singularity inspect -H command. Note, that the text gives the paths to the original definition file as well as the output generated by running sudo singularity build .... Those two files are trivial to access.

singularity exec gromacs.sif.0 cat /opt/scripts/def/gromacs.def
singularity exec gromacs.sif.0 cat /opt/logs/create.log.0

From now on, this provenance history grows every time the containerized application is built on (or targeted at) a HPC platform.

Targeting the container

The command below starts the process that compiles the containerized GROMACS source on the ARCHER2 4cab system. The final string argument specifies the GROMACS version (2021.1), the host MPI library (Cray MPICH v8) and the compiler (GCC v10).

~/work/scripts/fac/singularity/target.sh ~/work/scripts ${PWD} gromacs archer2 "/work/z19/z19/mrb4cab/containers/build" "2021.1 cmpich8-ofi gcc10"

The target.sh script is actually quite simple: the container image file is uploaded to the HPC platform (the target), a deployment script is run and then a new container image file is downloaded back to the factory.

Targeting Script
  # https://github.com/mbareford/container-factory/blob/main/scripts/fac/singularity/target.sh

  #!/bin/bash

  SCRIPTS_ROOT=$1
  IMG_PATH=$2
  APP=$3
  HOST=$4
  DEPLOY_PATH=$5
  DEPLOY_ARGS="${APP} ${DEPLOY_PATH}/${APP}.sif \"$6\""
  DEPLOY_SCRIPT=${SCRIPTS_ROOT}/app/${APP}/host/${HOST}/deploy.sh

  . ${SCRIPTS_ROOT}/fac/singularity/get_latest_suffix.sh
  get_latest_suffix ${IMG_PATH} ${APP}
  next_suffix=`expr ${suffix} + 1`

  echo "Uploading ${APP} singularity image to ${HOST} host..."
  scp ${IMG_PATH}/${APP}.sif.${suffix} ${HOST}:${DEPLOY_PATH}/${APP}.sif

  echo "Running the deployment script that builds a containerized ${APP} app on the ${HOST} host..."
  ssh ${HOST} "bash -ls" < ${DEPLOY_SCRIPT} ${DEPLOY_ARGS}

  echo "Downloading new ${APP} singularity image from ${HOST} host..."
  scp ${HOST}:${DEPLOY_PATH}/${APP}.sif ${IMG_PATH}/${APP}.sif.${next_suffix}

  echo "Deleting the ${APP} singularity image left on ${HOST} host..."
  ssh ${HOST} rm -f ${DEPLOY_PATH}/${APP}.sif

  echo "Targeting complete!"
  echo "${IMG_PATH}/${APP}.sif.${next_suffix}"

What is the deployment script? It is a short script that is executed on the the target (the HPC host).

GROMACS ARCHER2 Deployment Script
  # https://github.com/mbareford/container-factory/blob/main/scripts/app/gromacs/host/archer2/deploy.sh

  #!/bin/bash
  
  APP=$1
  SIF=$2
  HOST=archer2
  BUILD_ARGS="${HOST} $3"
  BIND_ARGS=`singularity exec ${SIF} cat /opt/scripts/app/${APP}/host/${HOST}/bindpaths.lst`

  echo "Converting ${APP} container image to sandbox..."
  singularity build --sandbox ${SIF}.sandbox ${SIF}
  echo ""

  echo "Building ${APP} within container sandbox..."
  singularity exec -B ${BIND_ARGS} --writable ${SIF}.sandbox /opt/scripts/app/${APP}/build.sh ${BUILD_ARGS}
  echo ""

  echo "Converting ${APP} container sandbox back to image..."
  singularity build --force ${SIF} ${SIF}.sandbox
  echo ""

  echo "Deleting ${APP} container sandbox..."
  rm -rf ${SIF}.sandbox

Near the start of the deployment, the bind paths - the links between the container and host file systems - are extracted from the container image. These paths are needed for when the containerized application is built and are different for each HPC host. Presented below are the bind paths for the ARCHER2 4cab system.

# https://github.com/mbareford/container-factory/blob/main/scripts/app/gromacs/host/archer2/bindpaths.lst

/work/y07/shared,/opt/cray,/usr/lib64:/usr/lib64/host,/etc/libibverbs.d

You can see that the bind paths are given as a comma separated list. The syntax for a single bind path follows src[:dest[:opts]], where src and dest are respectively, outside (on the host) and inside the container. If dest is not given, it is set equal to src. Lastly, the opts setting is rw by default.

Returning to the deployment script, the container image is now converted to a sandbox in preparation for building the (GROMACS) code. Container images are immutable objects; this conflicts with the fact that a code compilation will add new files to the container. The solution is to convert the container to a sandbox directory and then use the --writable flag when building the code within the sandbox. Once the build has completed, the sandboxed container is converted back to an image file (in fact, the deployment script uses the --force flag to ensure that the original image file is overwritten).

The GROMACS build script (called from deploy.sh) is shown below — the cmake command has been abbreviated for clarity.

GROMACS Build Script
  # https://github.com/mbareford/container-factory/blob/main/scripts/app/gromacs/build.sh

  #!/bin/bash

  HOST=$1
  VERSION=$2
  MPI_LABEL=$3
  COMPILER_LABEL=$4

  LABEL=gromacs
  NAME=${LABEL}-${VERSION}
  ROOT=/opt/app/${LABEL}
  HOST_PATH=${HOST}/${MPI_LABEL}/${COMPILER_LABEL}
  SCRIPTS_ROOT=/opt/scripts/app/${LABEL}/host/${HOST_PATH}
  BUILD_ROOT=${ROOT}/${NAME}
  INSTALL_ROOT=${ROOT}/${VERSION}/${HOST_PATH}
  LOG_ROOT=/opt/logs
  CMAKE_PRELOAD=/lib/x86_64-linux-gnu/libssl.so.1.1:/lib/x86_64-linux-gnu/libcrypto.so.1.1

  # set the build environment
  . ${SCRIPTS_ROOT}/env.sh

  # set make log name
  mkdir -p ${LOG_ROOT}
  if [ -f "${LOG_ROOT}/.make" ]; then
    makecnt=`cat ${LOG_ROOT}/.make`
    makecnt=`expr ${makecnt} + 1`
  else
    makecnt="1"
  fi
  MAKE_LOG=${LOG_ROOT}/make.log.${makecnt}
  echo "${makecnt}" > ${LOG_ROOT}/.make

  # set compiler and build flags
  export FLAGS="-O3 -ftree-vectorize -funroll-loops"

  # build
  BUILD_PATH=${BUILD_ROOT}/build/${HOST_PATH}/single
  rm -rf ${BUILD_PATH}
  mkdir -p ${BUILD_PATH}
  cd ${BUILD_PATH}

  LD_PRELOAD=${CMAKE_PRELOAD} cmake ${BUILD_ROOT} \
      -DGMX_MPI=ON -DGMX_OPENMP=ON -DGMX_HWLOC=OFF -DGMX_GPU=OFF \
      ...
      -DCMAKE_INSTALL_PREFIX=${INSTALL_ROOT} &> ${MAKE_LOG}

  LD_PRELOAD=${CMAKE_PRELOAD} make install &>> ${MAKE_LOG}

  # record
  currentDateTime=`date +"%Y-%m-%d %T"`
  echo "    ${currentDateTime}: Built ${LABEL} ${VERSION} (${MPI_LABEL}-${COMPILER_LABEL}) on ${HOST}" >> /.singularity.d/runscript.help
  echo "                         (${MAKE_LOG})" >> /.singularity.d/runscript.help
  echo "" >> /.singularity.d/runscript.help

At the beginning of the build script, many environment variables are initialised; a further set of variables are initialised by sourcing a second script called env.sh located within the container on a path that defines a particular combination of HPC host, MPI library and compiler, e.g., /opt/scripts/app/gromacs/host/archer2/cmpich8-ofi/gcc10. The content of the env.sh for that particular example is produced below.

# https://github.com/mbareford/container-factory/blob/main/scripts/app/gromacs/host/archer2/cmpich8-ofi/gcc10/env.sh

MPI_ROOT=/opt/cray/pe/mpich/8.0.16/ofi/gnu/9.1
MPI_C_LIB=mpi
MPI_CXX_LIB=mpi
MPI_LIBRARY_PATH=${MPI_ROOT}/lib/libmpi.so
FFTW_ROOT=/opt/cray/pe/fftw/3.3.8.8/x86_rome
LIBSCI_ROOT=/opt/cray/pe/libsci/20.10.1.2/GNU/9.1/x86_64
BLAS_LIBRARIES=${LIBSCI_ROOT}/lib/libsci_gnu_82_mpi_mp.so
LAPACK_LIBRARIES=${BLAS_LIBRARIES}
LD_LIBRARY_PATH=${FFTW_ROOT}/lib:${LIBSCI_ROOT}/lib:${MPI_ROOT}/lib:\
  /opt/cray/pe/lib64:/opt/cray/libfabric/1.11.0.0.233/lib64:\
  /usr/lib64/host:/usr/lib64/host/libibverbs:\
  /lib/x86_64-linux-gnu:/.singularity.d/libs
CC=gcc
CXX=g++

The sourcing of env.sh enables the make command to find the headers and libraries required to build the containerized application.

Notice also, that the build.sh script takes some care ensuring that the make output is directed to an appropriately named log file. As mentioned earlier, this is so a container’s history can be accessed via the Singularity inspect command, such as
singularity inspect -H gromacs.sif.2.

This GROMACS (http://www.gromacs.org/) container image files was created at the EPCC Container Factory,
an OpenStack Ubuntu 20.04 instance (ID 859596f3-6683-4951-82d4-f9e080c30d1f) hosted by the University of Edinburgh Eleanor Research Cloud.

The container is based on Ubuntu 20.04 and features GCC 10.0.1, Miniconda3 4.8.3, CMake 3.18.4 and the GROMACS source code version 2021.1.
See the container creation log at "/opt/logs/create.log.0" and the original definition file at "/opt/scripts/def/gromacs.def".

Submission script templates can be found under "/opt/scripts/app/gromacs/host/".
These script files are named "submit.sh" and are organised by "<host name>/<MPI library>/<compiler>".


2021-05-25 12:39:53: Built gromacs 2021.1 (cmpich8-ofi-gcc10) on archer2
                     (/opt/logs/make.log.1)

2021-05-25 13:42:56: Built gromacs 2021.1 (ompi4-ofi-gcc10) on archer2
                     (/opt/logs/make.log.2)

We see that the GROMACS code has been targeted twice at the ARCHER2 platform, once using Cray MPICH v8 and again using OpenMPI v4. Both of these events are time stamped and the container-based paths to the make logs are indicated. In this way, possession of a container image file should be sufficient for determining the HPC platforms on which the containerized application is expected to run. Note also that batch submission script templates also exist within the container.

It is apparent that this targeting workflow is somewhat complex involving many Bash scripts most of which are executed at the factory. The exceptions are the deployment script which is run on the HPC host and the build script which is run within the sandboxed Singularity container on the HPC host.

Other codes such as CASTEP and RAMSES have been containerized in a similar manner and run successfully on ARCHER2. These successes suggest that the design implict in the Container Factory script hierarchy will be flexible enough to accommodate more parallel codes (and more HPC platforms).

Addendum

Recent versions of Singularity (>= 3.7.x) may provide a further complication: the need to create host-specific file paths within the container before the targeting process can begin. This isn’t currently an issue with the ARCHER2 4cab system as the version of Singularity installed on that platform is 3.5.3-1, but, the Tier-2 Cirrus machine has Singularity v3.7.2-1. And so, targeting the GROMACS container at Cirrus, first requires the creation of the /lustre/sw, /opt/hpe and /etc/libibverbs.d paths in order to support the use of the various bindpaths specified in the accompanying Cirrus deploy.sh script. This extra step is handled at the factory by target_init.sh.