Table of Contents

SLURM srun

Running multiple jobs in parallel

Submitting a job can be done easily with sbatch job.sbatch. Where job.sbatch may contain the following. See SlurmMD or the man srun manual.

#SBATCH --partition=main
#SBATCH -N4
srun -N1 -n1 --exclusive job-step.sh &

This job will allocate 4 nodes (so that for example you can run 4 job steps in parallel) on the 'main' partition. Also, it will allocate one node for each job step. Note that it is necessary to run 'srun' in the background (&). Otherwise your job step will be killed when 'sbatch' exits.

WARNING running a job in the background will kill your standard and error output !!!! You should not put your last call to srun in the background !!!

#SBATCH --partition=main
#SBATCH -N4
srun -N1 -n1 --exclusive job-step_1.sh &
srun -N1 -n1 --exclusive job-step_2.sh

Submitting jobs with exclusive access

For benchmarking purposes, you would like to block other jobs on your node(s). Without the exclusive option to the 'sbatch' command, the scheduler will schedule multiple job steps on a single node. This makes it useless for benchmarking. If this is required you can add the option “–exclusive”, this would be similar to the option “–cpus-per-task=*n*”. Here *n* is the amount of CPU cores on that node. Allocating all CPU cores on a node allows exclusive access to that node. More details can be read here : Support for Multi-core/Multi-thread Architectures

#SBATCH --partition=m610
#SBATCH -N4
#SBATCH --exclusive
srun -N1 -n1 --exclusive job-step.sh &

Scheduling umpteen job steps

According to many docs which can be found online, SLURM should be able to schedule tens of thousands job steps in seconds[1]. However, a few hours after your jobs have started many errors will be reported by the 'srun' command, e.g.:

Note that in this case some job steps will not be scheduled! The solution to this problem is to schedule less job steps in more jobs. We have configured the maximum amount job step per job and the maximum amount of jobs which seem to work well. We suggest to retrieve the current values from the SLURM config and then use these values in a shell script, e.g. with:

Attached (gen-experiments.wiki) one can find an example BASH script to split job steps over multiple jobs. This example is used to run multiple models on some LTSmin binaries. This script was designed when it was unclear which models (Promela, DVE or mCRL2) would benefit of some changes to LTSmin. So the script runs every known model.

Submitting jobs using features or generic consumable resources

If a certain feature is required to run your jobs, you can easily add the –constraint=“feature” argument to the command that is being used to submit your job :

# using the Geforce Titan-X gpu(s)
srun -N1 --constraint="titan-x" --gres=gpu:1 job-gpu.sh &
# using the Tesla P100 gpu(s)
srun -N1 --constraint="p100" --gres=gpu:1 job-gpu.sh &

or if a generic consumable resource is required to run your jobs, you can add the –gres=“resource” argument to the command that is being used to submit your job :

# one gpu
srun -N1 --gres=gpu:1 job-gpu.sh &
# two gpus
srun -N1 --gres=gpu:2 job-gpu.sh &

combining features and generic consumable resources is also possible.

Submitting jobs within a reservation window.

When a reservation is being used to run your jobs, you can add the –reservation=“reservation_name” argument to the command that is being used to submit your job :

# using reservation "project-x"
srun -N1 --reservation="project-x" job-gpu.sh &

Your job will only run during your reservation time and on the required resources.

Verifying the successful execution of job steps

If the SLURM control daemon is too busy it sometimes cancels the execution of a job step. To verify whether all job steps have completed you can issue the following command — assuming your SLURM log (stderr and stdout) is slurm.log.

cat slurm.log -n | grep -v created | grep -v disabled | grep srun

If you do not see messages like:

Then you may assume all job steps have completed successfully.

Canceling multiple jobs

The command “scancel” does not support canceling multiple jobs at once, however you can pipe formatted output from squeue to “xargs”, e.g.:

squeue -p m610 -u meijerjjg -o "%i" | xargs -I{} scancel {}

Interactive jobs

sinteractive is a tiny wrapper on srun to create interactive jobs quickly and easily. It allows you to get a shell on one of the nodes, with similar limits as you would do for a normal job. To use it, simply run:

sinteractive -c <num_cpus> --mem <amount_mem> --time <minutes> -p <partition>

You will then be presented with a new shell prompt on one of the compute nodes (run 'hostname' to see which!). From here, you can test out code in an interactive fashion as needs be.

Be advised though - not filling in the above fields will get you a shell with 1 CPU and 100Mb of RAM for 1 hour. This is useful for quick testing.

The source of sinteractive is here :

sinteractive
#!/bin/bash
srun "$@" -I60 -N 1 -n 1 --pty bash -i