SLURM sbatch
Submitting a job can be done easily with sbatch job.sbatch. Where job.sbatch may contain the following.
Each sbatch script may contain options preceded with #SBATCH before any executable commands in the script. See SlurmMD or the man sbatch manual.
Examples:
- simple.sbatch
#!/bin/bash # parameters for slurm #SBATCH -c 2 # number of cores, 1 #SBATCH --gres=gpu:1 # number of gpus 1, remove if you don't use gpu's #SBATCH --mem=1gb # Job memory request #SBATCH --mail-type=END,FAIL # email status changes (NONE, BEGIN, END, FAIL, ALL) #SBATCH --time=1:00:00 # time limit 1h # show actual node in output file, usefull for diagnostics hostname # load all required software modules module load nvidia/cuda-10.1 # It's nice to have some information logged for debugging echo "Gpu devices : "$CUDA_VISIBLE_DEVICES echo "Starting worker: " # Run the job -- make sure that it terminates itself before time is up ./gpu_burn 60 # if your cluster has GPU
- advanced.sbatch
#!/bin/bash # parameters for slurm #SBATCH -J gpu-burn # job name, don't use spaces, keep it short #SBATCH -c 2 # number of cores, 1 #SBATCH --gres=gpu:1 # number of gpus 1, some clusters don't have GPUs #SBATCH --mem=1gb # Job memory request #SBATCH --mail-type=END,FAIL # email status changes (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=myname@utwente.nl # Where to send mail to #SBATCH --time=1:00:00 # time limit 1h #SBATCH --output=job_test_%j.log # Standard output and error log #SBATCH --error=%j.err # if yoou want the errors logged seperately #SBATCH --partition=50_procent_max_7_days # Here 50..is the partition name..can be checked via sinfo # Create a directory for this job on the node ScratchDir="/local/${SLURM_JOBID}" if [ -d "$ScratchDir" ]; then echo "'$ScratchDir' already found !" else echo "'$ScratchDir' not found, creating !" mkdir $ScratchDir fi cd $ScratchDir # Copy input and executable to the node cp -r ${SLURM_SUBMIT_DIR}/input/* $ScratchDir # load all modules needed module load nvidia/cuda-10.1 module load mpi/openmpi-x86_64 # It's nice to have some information logged for debugging echo "Date = $(date)" echo "Hostname = $(hostname -s)" # log hostname echo "Working Directory = $(pwd)" echo "Number of nodes used : "$SLURM_NNODES echo "Number of MPI ranks : "$SLURM_NTASKS echo "Number of threads : "$SLURM_CPUS_PER_TASK echo "Number of MPI ranks per node: "$SLURM_TASKS_PER_NODE echo "Number of threads per core : "$SLURM_THREADS_PER_CORE echo "Name of nodes used : "$SLURM_JOB_NODELIST echo "Gpu devices : "$CUDA_VISIBLE_DEVICES echo "Starting worker: " caseName=${PWD##*/} # to distinguish several log files # Run the job -- make sure that it terminates itself before time is up # Do not submit into the background (i.e. no & at the end of the line). mpirun comsol batch -in inputFile > loga_$caseName.out # Copy output back to the master, comment with # if not used cp log_file.txt ${SLURM_SUBMIT_DIR} cp simulation_data.csv ${SLURM_SUBMIT_DIR} cp warnings_data.txt ${SLURM_SUBMIT_DIR} mv output ${SLURM_SUBMIT_DIR} # Clean up on the compute node ! cd ~ if [ -d "$ScratchDir" ]; then echo "'$ScratchDir' found and now copying files, please wait ..." rm -rf $ScratchDir else echo "Warning: '$ScratchDir' NOT found." fi # Done.
Another nice job script example can be found here
Common Options
When allocating resources from SLURM, there are many options to control how SLURM makes decisions. Below we will explore some of these options.
The first line of the file
#!/bin/bash
-J (job name)
The -J options lets you name your job. This can be convenient when you have many jobs running as your can then tell the difference when querying running job information.
Usage: -J <job name>
#SBATCH -J run34
Names job “run34”
--mail-type (status email)
The –mail-type option tells SLURM how you want to be notified with an event with your job occurs. By default NONE will be used. The other common options are:
- BEGIN (email when job begins)
- END (email when job ends)
- FAIL (email when job fails)
- ALL
Usage: –mail-type=<comma separated list>
#SBATCH --mail-type=END,FAIL
You will receive an email when your job ends and if it fails.
-c (cores)
The -c option tells SLURM how many cores you would like to use per task. These cores will all be on a single node, so the number you request must be below the total cores on available nodes. It is important to understand that simply giving more cores to your program won't always result in the additional cores being used. The program you are running must be programed to take advantage (multiple threads or multiple processes). With the current HPC model, each core give you about 7GB of RAM per core.
Usage: -c <# cores>
#SBATCH -c 4
Allocate 4 cores.
--gres (generic resource)
The –gres option tells SLURM how much generic resources your job requires. With this option you can add gpu support to your job.
Usage: –gres=gpu((:type):count)
Allocating 1 gpu :
#SBATCH --gres=gpu:1
Allocating 2 pascal gpu’s :
#SBATCH --gres=gpu:pascal:2
For details on these resources, check : EEMCS-HPC features, resources and partitions
--constraint (feature)
The –constraint option tells SLURM which certain features are required to run your jobs. With this option you can add specific cpu/gpu requirements to your job.
Usage: –constraint=“<feature>”
Allocating a Titan-X :
# using the Geforce Titan-X gpu(s) #SBATCH --gres=gpu:1 #SBATCH --constraint="titan-x"
Allocating a Tesla P100 :
# using the Tesla P100 gpu(s) #SBATCH --gres=gpu:1 #SBATCH --constraint="p100"
For details on these features, check : EEMCS-HPC features, resources and partitions
--mem (memory)
The –mem option tells SLURM how much memory per node your job requires. SLURM will give 7GB to each core that you allocate, but if you need more, the –mem option can accomplish this. Taking too much memory will keep other jobs from running on the node, so caution should be taken when using.
Usage: –mem=<# megabytes>
#SBATCH --mem=12288
Allocate 12GB memory.
-n (ntasks)
The -n options tells SLURM how many of the same task to start. This is used for MPI jobs like bertini where you need to start many copies of the same program and they communicate between themselves. The number of tasks can be any number up to the maximum you can allocate and will spread over multiple nodes.
Usage: -n <# tasks>
#SBATCH -n 45
This will start 45 processes of your application
-N (nodes)
The -N option tells SLURM how many minimum nodes you would like to use. This is used for MPI jobs like bertini where you want to spread out the processing across a minimum of multiple nodes. This option is used with the -c or -n option. If you specified -n 4, asking for 4 tasks, and -N 4, asking for 4 nodes, the task would be spread out over 4 nodes consisting of one task per node.
Usage: -N <# nodes>
#SBATCH -N 4
The job will use a minimum of 4 nodes
-p (partition)
The -p option tells SLURM which partition of machines to use. The partitions are made up of like machines that are administratively separated for use. If you don't specify this option the “main” partition is used that every node is a member of. Other partitions are created for exclusive access to nodes.
Usage: -p <partition name>
#SBATCH -p main
The job will use the machines in the “main” partition.
For details on these partitions, check : EEMCS-HPC features, resources and partitions
--qos (quality of service)
Every user on the cluster is assigned a default Quality of Service, for special purposes extended qos are created. These extended qos levels need to be requested, once granted this extended QOS can be activated.
Usage: –qos=<qos-name>
#sbatch --qos=students-deadline
--time (time)
By default a job will have a default time limit of 21 days. This is a soft limit that can be overridden from within a batch file or after a job has been started. UNLIMITED is an option for the time limit.
Usage: –time=<D-HH:MM>
#SBATCH --time=32-00:00
The job will have a time limit of 32 days.
Modifying the jobs time limit.
A running job can be extended using scontrol and the JOBID.
scontrol update jobid=1234 TimeLimit=32-00:00
The job 1234 will now have a time limit of 32 days.