SLURM sbatch [University of Twente, HPC Wiki]

Submitting a job can be done easily with sbatch job.sbatch. Where job.sbatch may contain the following.

Each sbatch script may contain options preceded with #SBATCH before any executable commands in the script. See SlurmMD or the man sbatch manual.

Examples:

simple.sbatch

#!/bin/bash
# parameters for slurm
#SBATCH -c 2                          # number of cores, 1
#SBATCH --gres=gpu:1                  # number of gpus 1, remove if you don't use gpu's
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --mail-type=END,FAIL          # email status changes (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --time=1:00:00                # time limit 1h
 
# show actual node in output file, usefull for diagnostics
hostname
 
# load all required software modules
module load nvidia/cuda-10.1 
 
# It's nice to have some information logged for debugging
echo "Gpu devices                 : "$CUDA_VISIBLE_DEVICES
echo "Starting worker: "
 
# Run the job -- make sure that it terminates itself before time is up
./gpu_burn 60  # if your cluster has GPU

advanced.sbatch

#!/bin/bash
# parameters for slurm
#SBATCH -J gpu-burn                   # job name, don't use spaces, keep it short
#SBATCH -c 2                          # number of cores, 1
#SBATCH --gres=gpu:1                  # number of gpus 1, some clusters don't have GPUs
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --mail-type=END,FAIL          # email status changes (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=myname@utwente.nl   # Where to send mail to
#SBATCH --time=1:00:00                # time limit 1h
#SBATCH --output=job_test_%j.log      # Standard output and error log
#SBATCH --error=%j.err                # if yoou want the errors logged seperately
#SBATCH --partition=50_procent_max_7_days # Here 50..is the partition name..can be checked via sinfo
 
# Create a directory for this job on the node
ScratchDir="/local/${SLURM_JOBID}"
if [ -d "$ScratchDir" ]; then
   echo "'$ScratchDir' already found !"
else
   echo "'$ScratchDir' not found, creating !"
   mkdir $ScratchDir
fi
cd $ScratchDir
 
# Copy input and executable to the node
cp -r ${SLURM_SUBMIT_DIR}/input/* $ScratchDir
 
# load all modules needed 
module load nvidia/cuda-10.1 
module load mpi/openmpi-x86_64
 
# It's nice to have some information logged for debugging
echo "Date              = $(date)"
echo "Hostname          = $(hostname -s)" # log hostname
echo "Working Directory = $(pwd)"
echo "Number of nodes used        : "$SLURM_NNODES
echo "Number of MPI ranks         : "$SLURM_NTASKS
echo "Number of threads           : "$SLURM_CPUS_PER_TASK
echo "Number of MPI ranks per node: "$SLURM_TASKS_PER_NODE
echo "Number of threads per core  : "$SLURM_THREADS_PER_CORE
echo "Name of nodes used          : "$SLURM_JOB_NODELIST
echo "Gpu devices                 : "$CUDA_VISIBLE_DEVICES
echo "Starting worker: "
 
caseName=${PWD##*/} # to distinguish several log files
# Run the job -- make sure that it terminates itself before time is up
# Do not submit into the background (i.e. no & at the end of the line).
mpirun comsol batch -in inputFile > loga_$caseName.out
 
# Copy output back to the master, comment with # if not used
cp log_file.txt ${SLURM_SUBMIT_DIR}
cp simulation_data.csv ${SLURM_SUBMIT_DIR}
cp warnings_data.txt ${SLURM_SUBMIT_DIR}
mv output ${SLURM_SUBMIT_DIR}
# Clean up on the compute node !
cd ~
if [ -d "$ScratchDir" ]; then
   echo "'$ScratchDir' found and now copying files, please wait ..."
   rm -rf $ScratchDir
else
   echo "Warning: '$ScratchDir' NOT found."
fi
# Done.

Another nice job script example can be found here

When allocating resources from SLURM, there are many options to control how SLURM makes decisions. Below we will explore some of these options.

The first line of the file

#!/bin/bash

-J (job name)

The -J options lets you name your job. This can be convenient when you have many jobs running as your can then tell the difference when querying running job information.
Usage: -J <job name>

#SBATCH -J run34

Names job “run34”

--mail-type (status email)

The –mail-type option tells SLURM how you want to be notified with an event with your job occurs. By default NONE will be used. The other common options are:

BEGIN (email when job begins)
END (email when job ends)
FAIL (email when job fails)
ALL

Usage: –mail-type=<comma separated list>

#SBATCH --mail-type=END,FAIL

You will receive an email when your job ends and if it fails.

-c (cores)

The -c option tells SLURM how many cores you would like to use per task. These cores will all be on a single node, so the number you request must be below the total cores on available nodes. It is important to understand that simply giving more cores to your program won't always result in the additional cores being used. The program you are running must be programed to take advantage (multiple threads or multiple processes). With the current HPC model, each core give you about 7GB of RAM per core.
Usage: -c <# cores>

#SBATCH -c 4

Allocate 4 cores.

--gres (generic resource)

The –gres option tells SLURM how much generic resources your job requires. With this option you can add gpu support to your job.
Usage: –gres=gpu((:type):count)

Allocating 1 gpu :

#SBATCH --gres=gpu:1

Allocating 2 pascal gpu’s :

#SBATCH --gres=gpu:pascal:2

For details on these resources, check : EEMCS-HPC features, resources and partitions

--constraint (feature)

The –constraint option tells SLURM which certain features are required to run your jobs. With this option you can add specific cpu/gpu requirements to your job.
Usage: –constraint=“<feature>”

Allocating a Titan-X :

# using the Geforce Titan-X gpu(s)
#SBATCH --gres=gpu:1
#SBATCH --constraint="titan-x"

Allocating a Tesla P100 :

# using the Tesla P100 gpu(s)
#SBATCH --gres=gpu:1
#SBATCH --constraint="p100"

For details on these features, check : EEMCS-HPC features, resources and partitions

--mem (memory)

The –mem option tells SLURM how much memory per node your job requires. SLURM will give 7GB to each core that you allocate, but if you need more, the –mem option can accomplish this. Taking too much memory will keep other jobs from running on the node, so caution should be taken when using.
Usage: –mem=<# megabytes>

#SBATCH --mem=12288

Allocate 12GB memory.

-n (ntasks)

The -n options tells SLURM how many of the same task to start. This is used for MPI jobs like bertini where you need to start many copies of the same program and they communicate between themselves. The number of tasks can be any number up to the maximum you can allocate and will spread over multiple nodes.
Usage: -n <# tasks>

#SBATCH -n 45

This will start 45 processes of your application

-N (nodes)

The -N option tells SLURM how many minimum nodes you would like to use. This is used for MPI jobs like bertini where you want to spread out the processing across a minimum of multiple nodes. This option is used with the -c or -n option. If you specified -n 4, asking for 4 tasks, and -N 4, asking for 4 nodes, the task would be spread out over 4 nodes consisting of one task per node.
Usage: -N <# nodes>

#SBATCH -N 4

The job will use a minimum of 4 nodes

-p (partition)

The -p option tells SLURM which partition of machines to use. The partitions are made up of like machines that are administratively separated for use. If you don't specify this option the “main” partition is used that every node is a member of. Other partitions are created for exclusive access to nodes.
Usage: -p <partition name>

#SBATCH -p main

The job will use the machines in the “main” partition.

For details on these partitions, check : EEMCS-HPC features, resources and partitions

--qos (quality of service)

Every user on the cluster is assigned a default Quality of Service, for special purposes extended qos are created. These extended qos levels need to be requested, once granted this extended QOS can be activated.
Usage: –qos=<qos-name>

#sbatch --qos=students-deadline

--time (time)

By default a job will have a default time limit of 21 days. This is a soft limit that can be overridden from within a batch file or after a job has been started. UNLIMITED is an option for the time limit.
Usage: –time=<D-HH:MM>

#SBATCH --time=32-00:00

The job will have a time limit of 32 days.

Modifying the jobs time limit.

A running job can be extended using scontrol and the JOBID.

scontrol update jobid=1234 TimeLimit=32-00:00

The job 1234 will now have a time limit of 32 days.

SLURM sbatch