In order to use additional slurm scripts, load the following module :

module load slurm/utils

A User is a single person, an Account is a group of users, or even a group of groups.

Activating Users

  1. Add the user to the NIS database, the user will be added to the default cluster account ctit.
  2. move the user to the correct cluster account (read research or student group)

for this you can use the following script :

sacctmgr-move-user ctit <groupname> <username>

Adjust the Quality of Service or QOS.

Every account already wil have a default QOS defined.

In some cases a different QOS is required, to Modifying Quality of service use the following command :

# default QOS
sudo sacctmgr modify user where name=<username> set DefaultQOS=<defaultQOSname>
# additional QOS (temporary)
sudo sacctmgr modify user where name=<username> set QOS+=<additionalQOSname>
# remove additional QOS (temporary)
sudo sacctmgr modify user where name=<username> set QOS-=<additionalQOSname>

Creating new Accounts

Creating new accounts (read reasearch groups) :

sudo sacctmgr create account name=<groupname> parent=<faculty> fairshare=1

The fair share factor is the amount of investment in K€ !

dump userdatabase

In order to recover or get an overview of all the activated account an dump configuration file can be generated using following command :

sudo sacctmgr-dump

This will create a ctit_<date>.cfg file containing all the accounts/users and their priority factor structure.

GPU's can be monitored using the tools supplied in the module nvidia/nvtop :

module load nvidia/nvtop

To monitor the GPU's on a specific node, use one of the following commands :

nvidia-smi-node <nodename>
nvtop-node <nodename>

To show the jobs assigned gpu, use the following command :

scontrol show job <jobid> -d

Note : look for GRES=gpu(IDX:…)

Utilization reports can be generated using one of the following commands :

 sreport cluster AccountUtilization cluster=ctit start=1/1/21 end=12/31/21 > Utilisation_2021
 sreport cluster AccountUtilizationByUser cluster account=<account_name> start=2020-03-25 end=2020-03-25
 sreport cluster AccountUtilizationByUser cluster user=<user_name> start=2020-03-25 end=2020-03-25

create maintenance reservation

To create a maintenance reservation use the following command :

scontrol create reservation starttime=2022-03-23T8:00:00 duration=480 user=root flags=maint,ignore_jobs nodes=ALL

terminate running jobs

squeue -ho %A -t R | xargs -n 1 scancel 

stopping slurm daemons

scontrol shutdown

undraining a node

sudo scontrol update NodeName=<node_name> State=DOWN Reason="undraining"
sudo scontrol update NodeName=<node_name> State=RESUME

system serial number

sudo dmidecode -s system-serial-number

To keep the power usage at a lower level, compute nodes not being used will powerdown after a certain amount of time. These definitions are located in the slurm.conf file. See the SuspendTime for the actual time. To disable this functionality change to following line:

#SuspendExcParts=debug

to:

SuspendExcParts=debug