Table of Contents

EEMCS-HPC Specific resources (gpu), features and partitions

Generic Resources

The generic consumable resources on the HPC cluster are :

To request a gpu add the following line to your sbatch script :

#SBATCH --gres=gpu:1 

For selecting a specific gpu family, add the family name to the gpu request.

To request a specific gpu family, update the following line in your sbatch script (As an example the lovelace is selected) :

#SBATCH --gres=gpu:lovelace:1 

Keep in mind for gpu's you need to load the module of the required cuda version !

Once you request a gpu(s) resource, the scheduler will set for you the environment variable : CUDA_VISIBLE_DEVICES

This will point to the assigned gpu(s) for your job, you shall only use those and not others !!!

Some gpu boards have the nvlink modules fitted, this will allow you to double the gpu memory and cumputing power. If you request two gpu's with nvlink, you need to force socket binding using the following option :

#SBATCH --sockets-per-node=1 

features (constraint)

The available features are :

For example to force only a40 gpu's

#SBATCH --constraint=a40 

Hint : Use ampersand (AND) and pipe (OR) symbol to combine features.

partitions

The HPC/SLURM cluster contains multiple common partitions :

Partition name Nodes Details available to default timelimit
main ctit[080-094],caserta,hpc-node[01-12,14,16-18] for mixed use All
main-cpu ctit087,caserta,spark-head[1-4] for CPU jobs only All 24h
main-gpu ctit[084-086], ctit[088-094], hpc-node[01-12,14,16-18] for GPU jobs only All 24h

Compute nodes with gpus are allowed to run cpu jobs, but keep cpu resources available for gpu jobs !!!!!

As well as faculty/group specific additional partitions :

Partition name Nodes Details available to default timelimit
tfe-cpu hpc-node[20-26] for CPU jobs only et-(efd/msm/te/tcs/htt/gm/cmmm) 24h
tfe-gpu hpc-node[16-19] for GPU jobs only et-(efd/msm/te/tcs/htt/gm/cmmm) 24h
itc-cpu hpc-node[27-28,32] for CPU jobs only itc-(gaia/life/plan/tech) 24h
itc-gpu hpc-node[29-30] for GPU jobs only itc-(gaia/life/plan/tech) 24h
am hpc-node[01-04] eemcs-(dmmp/macs/mast/mia/mms/sor/stat)
bdsi ctit087 bms-bdsi
bss hpc-node15 cpu eemcs-bss
bmpi hpc-node[16-18] gpu tnw-bmpi
smm hpc-node19 gpu et-cem-smm
dmb ctit[084-085,092],hpc-node[07,09,31] eemcs-dmb
mia ctit[090-091,093-094],hpc-node05 eemcs-mia
mia-pof hpc-node06 eemcs-mia & tnw-pof
tfe hpc-node[16-19] mixed et-(efd/msm/te/tcs/htt/gm/cmmm)
ps hpc-node[11-12,14] eemcs-ps
ram ctit[086,089] eemcs-ram
students hpc-node08 eemcs-students

The following partition is only available for admins:

debug ctit[080-094],caserta,hpc-node[01-12,14-19] admin

The main partition is the default partition that will be used to submit a job to any of the nodes. The debug partition is for testing purposes only.

Access to the additional are limited to the funders during the first year of investment, these can be reached using the funders partitions.

Including multiple partitions is also possible. For example :

#SBATCH --partition=main,dmb
#SBATCH --partition=main,am,mia
#SBATCH --partition=main,students

See the EEMCS-HPC Hardware page for all the partition definitions.