(BMS/EEMCS/S&T/ET) HPC Cluster

One of the clusters at the University is the (BMS/EEMCS/S&T/ET) HPC Cluster. This cluster, funded by the DSI research institute (formerly known as CTIT), started in the year 2017 as joint operation of several research groups to work on deep learning / AI methods. During the years more groups participated and the cluster got expansion with several more nodes containing multi CPU/GPU combinations. The current participating faculties are the BMS, EEMCS, TNW(S&T) and ET faculty of the University of Twente.

This HPC cluster is a collection of many separate servers (computers), called compute nodes, which are connected via a fast interconnect.

There may be different types of nodes for different types of tasks. The HPC cluster listed on this wiki has

a headnode or login node, where users log in
multiple data nodes
Cpu compute nodes
“fat” Cpu compute node with 2TB of memory
Cpu+Gpu compute nodes (on these nodes computations can be run both on CPU cores and on a Graphical Processing Unit cards)

All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space. The difference between personal computer and a cluster node is in quantity, quality and power of the components.

For more information on the list of used hardware for this cluster, see the HPC Hardware page.

This cluster is based on the Slurm scheduler 21.08.5 running on Ubuntu 22.04 LTS.

To monitor the jobs and progress you can use the HPC Slurm dashboard page or the available command line tools like squeue or scontrol.

To use specific resources please check the HPC features, resources and partitions page.

See the Slurm/HPC scheduler info page for more information.

You can connect to one of the headnodes : hpc-head1.ewi.utwente.nl or hpc-head2.ewi.utwente.nl
See the connection info page on how to connect.

Do NOT login to the compute nodes, either directly or through ssh, ONLY on one of the head nodes !!!!

Maintenance

Upcoming maintenance :

Oct 2025, Software updates.

During the maintenance day, the whole cluster will go offline.

For smaller experiments and interactive jobs, please try other resources like :

Jupyter Lab (Utwente), Jupyter Lab Wiki (Utwente)

or external providers like :

Every member of the following will have a raised priority :
- BMS
  - BMS-BDSI
- EEMCS
  - EEMCS-BSS
  - EEMCS-DMB
  - EEMCS-FMT
  - EEMCS-HMI
  - EEMCS-SACS-MACS
  - EEMCS-SACS-MAST
  - EEMCS-SACS-MIA
  - EEMCS-SACS-MMS
  - EEMCS-MOR-SOR
  - EEMCS-MOR-DMMP
  - EEMCS-MOR-STAT
  - EEMCS-PS
  - EEMCS-RAM
  - EEMCS-SCS
  - EEMCS-Students
- TNW
  - TNW-BMPI
  - TNW-POF
  - TNW-M3I
- ET
  - ET-TFE
Non member employees, students and guests are allowed as well, but will get a basic priority factor.

To get access, you need to have an AD account of the University of Twente. All students and employees have such an account and they can be arranged for external persons. To get your AD account enabled for these clusters, you need to contact one of the contact persons.

Access to the following partitions are limited to the funders during the first year of investment, these can be reached using their partitions.

The HPC/SLURM cluster contains multiple common partitions :

Partition name	available to
main	All (default)
dmb	eemcs-dmb
ram	eemcs-ram
bdsi	bms-bdsi
mia	eemcs-mia
am	eemcs-(dmmp/macs/mast/mia/mms/sor/stat)
mia-pof	eemcs-mia & tnw-pof
…	…
students	eemcs-students

* For now the students partition is only for course related work, BSc and/or MSc will have access to the related research group partition.

Check the HPC specifics, partition option on how to select the these.

The participating groups who have done investments in the HPC cluster, therefore they will have more priority than other groups not participating.
In order to gain more priority, your group can do an investment in the HPC cluster, depending on the kind of investment this will result in :

Sole usage of the purchased compute node(s) for the time span of approx. one year.
A higher priority factor related to the total amount of investment, retirement of hardware will reduce the priority factor (after roughly 8 years).
Participating higher Quality of Service.

This combination will guarantee more priority and calculation time on the cluster.

please consult the corresponding contact for this :

Geert Jan Laanstra (EEMCS-DMB/SCS)
Jan Flokstra (EEMCS-DMB/HMI)
Martin Wilens (ET-MSM)
?
Frederik Reenders (LISA-ITO, only for calamities)

Admin Page

See the HPC Admin page for more information.

For staff, the username is probably your family name followed by your initials, for students its your student number starting with the “s”, for guest accounts this would be starting with the “x”.

DSI Computing Lab does not store your password and we are unable to reset your password. If you require password assistance, please visit the ICTS/LISA Servicedesk.

For the HPC/SLURM cluster, two mailing lists are created :

EEMCS-Hpc-Cluster-Users (all the users)
EEMCS-Hpc-Cluster-Managers (all the managers)

Access to HPC Computing resources is provided via secure shell (SSH) login.

Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal window. See the connecting page for more alternatives.

You can connect to one of the headnodes : hpc-head1.ewi.utwente.nl or hpc-head2.ewi.utwente.nl

See the connecting page for more information.

The cluster machines run on Ubuntu Server 22.04 lts, some basic packages in the repositories have been installed. Additional software is available using module files.

See the HPC Software page for more information.

The following folders are available :

Network wide personal folder :
- /home/<username> Home folder : This is where you store your code and project related materials, “small” amount of data is allowed within your home folder.
  - Don’t keep data for longer periods, get rid of bad results as soon as possible.
  - If it contains a static dataset, it should be moved to /deepstore/datasets/……

Network wide global folder :
- /deepstore/datasets Dataset folder : This is the location for mainly static and/or large data(sets).
  - Datasets should be stored here, preferably not in your user folder.
  - New folders are available on request.

Network wide global folder :
- /projects Projects folder : Shared directories for projects, shared word area.
  - New folders are available on request.

Local scratch folder :
- /local/<jobid> (preferred) or /local/<username> or /local/<projectname> scratch folder, use this space to store intermediate data during a job run to speed up processing and reduce network traffic.
  - At the start of your job, you can create a local folder and storage temporary data here.
  - At the end of the job, you should remove your data and created folders.

Quota

Quota is activated on the /home/<username> folder, this means we limit the amount of data in your personal folder.

below 1TB : This is fine, keep your data size below this threshold, clean up if possible !
Over 1TB : Writing will be blocked, you will definitely have to remove data.

Due to change of the file system to ZFS, a soft limit at 1TB and a hard limit on 2TB is not possible anymore. !!!!

Batch Jobs

Slurm sbatch is used to submit a job script for later execution.
The script will typically contain the scheduler parameters, setup commands and the processing task(s) or (if required) multiple Slurm srun commands to launch parallel tasks.
See the Slurm sbatch and Slurm srun wiki page for more details.

Before submitting jobs please note the maximum number of jobs and resources related to your accounts Quality Of Service (QOS).
These numbers can be obtained from the QOS tab on the HPC Slurm dashboard page.

Interactive Jobs

It is possible to request for an interactive job, within this job you can execute small experiments. Use this only for a short time (max 1 hour).
For this you can use the additional Slurm sinteractive command.

The following commands are located in the software module monitor/node, you should load them on beforehand. Check the Monitoring Computenodes page for more information.

During the job

You can monitor your jobs using the

HPC Slurm dashboard page
realtime cpu monitor : top-node <nodename>
realtime gpu monitor : nvtop-node <nodename>
snapshot gpu monitor : nvidia-smi-node <nodename>

After the job

When your job is finished you can check the :

content of your jobs logfile
jobs efficiency using : seff <jobnumber>

(BMS/EEMCS/S&T/ET) HPC Cluster

Introduction

Slurm Scheduler

Login Nodes

Maintenance

Alternatives

Access

Who has access?

Partitions

HPC Priority

Contact persons.

Admin Page

Credentials

Accounts

Mailing list

Connecting to the cluster

Setting up

Software.

Storage

Quota

Submitting Jobs

Batch Jobs

Interactive Jobs

Monitoring Jobs

During the job

After the job