Run tests on a SLURM cluster
Practice running parallel test jobs on a makeshift SLURM cluster which has OpenFOAM installed using Docker containers.
Alltest
script.Now that you’re feeling more confident about reading/writing MPI-ready code in OpenFOAM; it’s time we put what you learned so far to work. In this activity, you’ll try to run your unit-tests from previous exercises in a SLURM cluster (which emulated real-world clusters, but is made up from Docker containers instead of physical machines for convenience).
Get a makeshift cluster running
First let’s start with the requirements (Avoid distribution repositories for these):
- Docker must be installed. Check get.docker.com script
- You also need a recent version of docker-compose
To make a decent test SLURM cluster, you need a machine with at least 4 CPUs. From there, it’s a matter of cloning my already configured repository and installing required software:
docker-openfoam-slurm-cluster
directory we’re creating.
You can create it wherever you want.git clone https://github.com/FoamScience/docker-openfoam-slurm-cluster
cd docker-openfoam-slurm-cluster
# Build containers for different cluster nodes - See the diagram bellow
# All nodes are based off centOS 7 with OpenFOAM v2206 installed
# This takes around 4GB of disk space and some time to complete
docker-compose build
# Fire up the cluster
docker-compose up -d
The previous commands will result in the creation of a SLURM cluster with the following architecture (We’re ignoring a container dedicated to host databases as it’s not as important for our purposes):
- You submit jobs on the head-node
- The head-node controls job execution on the 4 compute notes
- To avoid excessive file copying,
var/axc
(inside thedocker-openfoam-slurm-cluster
directory) on your local machine is mounted to/axc
on all cluster nodes. All nodes get access to anything you put in there. root
is the default user for all operations inside the cluster.
It’s also good to do a pre-flight check to see if everything is working as expected
(What’s important is being able to perform mpirun
calls):
# Gain a root shell at the head-node
docker exec -it axc-headnode bash
# Source the OpenFOAM env. on the container
(axc-headnode) source /usr/lib/openfoam/openfoam2206/etc/bashrc
# Try an MPI job on all 4 compute nodes.
# This should report 4 different IP addresses
# Note the --allow-run-as-root
# And note that mpirun does not need -np because it's built with SLURM support
(axc-headnode) salloc -N 4 mpirun --allow-run-as-root hostname -I
Compile your test driver and prepare your case
Now that we have verified that we can submit mpirun
jobs to SLURM, we can attempt to compile the test driver
(foamUT
dependencies are already compiled and put at the right places in the nodes):
# On your host machine, clone foamUT to the shared directory:
git clone https://github.com/FoamScience/foamUT var/axc/foamUT
# Access a shell at the head node:
docker exec -it axc-headnode bash
# On the head node:
(axc-headnode) source /usr/lib/openfoam/openfoam2206/etc/bashrc
(axc-headnode) cd /axc/foamUT
(axc-headnode) # Install cmake v3.x
(axc-headnode) wget https://cmake.org/files/v3.12/cmake-3.12.3.tar.gz
(axc-headnode) tar zxvf cmake-3.*
(axc-headnode) cd cmake-3.*
(axc-headnode) ./bootstrap --prefix=/usr/local
(axc-headnode) make -j$(nproc)
(axc-headnode) make install
(axc-headnode) cd ..
(axc-headnode) export FOAM_FOAMUT=$PWD
(axc-headnode) sed -i 's_/lib_/lib64_g' tests/exampleTests/Make/options
(axc-headnode) ./Alltest
# Subsequent compilations:
(axc-headnode) cd tests/exampleTests && wmake
The resulting binary (/axc/foamUT/tests/exampleTests/testDriver
) also stays inside this shared directory,
so compiling on one of the CentOS containers is enough (since they are identical).
We’ll also be using the cavity case provided with foamUT
(you can do this on the head node):
# Copy the case
(axc-headnode) cp -r /axc/foamUT/cases/cavity /axc/testCase
(axc-headnode) cd /axc/testCase
# Create the mesh and decompose it
(axc-headnode) blockMesh
(axc-headnode) decomposePar
Submit a SLURM job to run example tests on the prepared case
To submit a simulation job, we first need to understand how the test driver works:
testDriver [catch_options] --- [openfoam_options]
So, to perform a job on the testCase
case which executes the parallel tests in parallel
(This is handled normally by the Alltest
script):
# --allow-run-as-root needed because mpirun will run as root
# and don't forget the -parallel flag
(axc-headnode) salloc -N 4 mpirun --allow-run-as-root \
/axc/foamUT/tests/exampleTests/testDriver '[parallel]' \
--- \
-case /axc/testCase -parallel
Can we run Alltest on the SLURM cluster?
Sure we can, all we have to do is to replace mpirun -np "$nProcs"
with salloc -N "$nProcs" mpirun
:
# This compiles only on head node, but runs tests on nProcs compute nodes
sed -Ei 's/mpirun (.*) -np "\$nProcs"/salloc -N "\$nProcs" mpirun \1/g' Alltest
Whether the tests pass for us or not is not important as paying attention to the output:
Case : /axc/testCase
nProcs : 4
Hosts :
(
(axc-compute-01 1)
(axc-compute-02 1)
(axc-compute-03 1)
(axc-compute-04 1)
)
and making sure every compute node is participating with 1 CPU, which proves that our training cluster is working as expected.
# On your host machine
# Make the cluster go offline without removing containers
docker-compose stop
# Bring down the cluster (stop and remove containers)
docker-compose down
If you need it, here is a short cheatsheet for SLURM