SLURM srun
Running multiple jobs in parallel
Submitting a job can be done easily with sbatch job.sbatch. Where job.sbatch may contain the following. See SlurmMD or the man srun manual.
#SBATCH --partition=main #SBATCH -N4 srun -N1 -n1 --exclusive job-step.sh &
This job will allocate 4 nodes (so that for example you can run 4 job steps in parallel) on the 'main' partition. Also, it will allocate one node for each job step. Note that it is necessary to run 'srun' in the background (&). Otherwise your job step will be killed when 'sbatch' exits.
WARNING running a job in the background will kill your standard and error output !!!! You should not put your last call to srun in the background !!!
#SBATCH --partition=main #SBATCH -N4 srun -N1 -n1 --exclusive job-step_1.sh & srun -N1 -n1 --exclusive job-step_2.sh
Submitting jobs with exclusive access
For benchmarking purposes, you would like to block other jobs on your node(s). Without the exclusive option to the 'sbatch' command, the scheduler will schedule multiple job steps on a single node. This makes it useless for benchmarking. If this is required you can add the option “–exclusive”, this would be similar to the option “–cpus-per-task=*n*”. Here *n* is the amount of CPU cores on that node. Allocating all CPU cores on a node allows exclusive access to that node. More details can be read here : Support for Multi-core/Multi-thread Architectures
#SBATCH --partition=m610 #SBATCH -N4 #SBATCH --exclusive srun -N1 -n1 --exclusive job-step.sh &
Scheduling umpteen job steps
According to many docs which can be found online, SLURM should be able to schedule tens of thousands job steps in seconds[1]. However, a few hours after your jobs have started many errors will be reported by the 'srun' command, e.g.:
- srun: error: slurm_receive_msg: Socket timed out on send/recv operation
- srun: error: authentication: Socket communication error
- srun: error: Unable to create job step: Protocol authentication error
Note that in this case some job steps will not be scheduled! The solution to this problem is to schedule less job steps in more jobs. We have configured the maximum amount job step per job and the maximum amount of jobs which seem to work well. We suggest to retrieve the current values from the SLURM config and then use these values in a shell script, e.g. with:
- “scontrol show config | grep MaxStepCount | grep -Eo '[0-9]+'”, and
- “scontrol show config | grep MaxJobCount | grep -Eo '[0-9]+'”.
Attached (gen-experiments.wiki) one can find an example BASH script to split job steps over multiple jobs. This example is used to run multiple models on some LTSmin binaries. This script was designed when it was unclear which models (Promela, DVE or mCRL2) would benefit of some changes to LTSmin. So the script runs every known model.
Submitting jobs using features or generic consumable resources
If a certain feature is required to run your jobs, you can easily add the –constraint=“feature” argument to the command that is being used to submit your job :
# using the Geforce Titan-X gpu(s) srun -N1 --constraint="titan-x" --gres=gpu:1 job-gpu.sh & # using the Tesla P100 gpu(s) srun -N1 --constraint="p100" --gres=gpu:1 job-gpu.sh &
or if a generic consumable resource is required to run your jobs, you can add the –gres=“resource” argument to the command that is being used to submit your job :
# one gpu srun -N1 --gres=gpu:1 job-gpu.sh & # two gpus srun -N1 --gres=gpu:2 job-gpu.sh &
combining features and generic consumable resources is also possible.
Submitting jobs within a reservation window.
When a reservation is being used to run your jobs, you can add the –reservation=“reservation_name” argument to the command that is being used to submit your job :
# using reservation "project-x" srun -N1 --reservation="project-x" job-gpu.sh &
Your job will only run during your reservation time and on the required resources.
Verifying the successful execution of job steps
If the SLURM control daemon is too busy it sometimes cancels the execution of a job step. To verify whether all job steps have completed you can issue the following command — assuming your SLURM log (stderr and stdout) is slurm.log.
cat slurm.log -n | grep -v created | grep -v disabled | grep srun
If you do not see messages like:
- “srun: error: authentication: Socket communication error”, or
- “srun: error: Unable to create job step: Protocol authentication error”.
Then you may assume all job steps have completed successfully.
Canceling multiple jobs
The command “scancel” does not support canceling multiple jobs at once, however you can pipe formatted output from squeue to “xargs”, e.g.:
squeue -p m610 -u meijerjjg -o "%i" | xargs -I{} scancel {}
Interactive jobs
sinteractive is a tiny wrapper on srun to create interactive jobs quickly and easily. It allows you to get a shell on one of the nodes, with similar limits as you would do for a normal job. To use it, simply run:
sinteractive -c <num_cpus> --mem <amount_mem> --time <minutes> -p <partition>
You will then be presented with a new shell prompt on one of the compute nodes (run 'hostname' to see which!). From here, you can test out code in an interactive fashion as needs be.
Be advised though - not filling in the above fields will get you a shell with 1 CPU and 100Mb of RAM for 1 hour. This is useful for quick testing.
The source of sinteractive is here :
- sinteractive
#!/bin/bash srun "$@" -I60 -N 1 -n 1 --pty bash -i