EEMCS-HPC Cluster

One of the clusters at the University is the EEMCS-HPC Cluster. This cluster, funded by the DSI research institute (formerly known as CTIT), started in the year 2017 as joint operation of several research groups to work on deep learning / AI methods. During the years more groups participated and the cluster got expansion with several more nodes containing multi CPU/GPU combinations.

This HPC cluster is a collection of many separate servers (computers), called computenodes, which are connected via a fast interconnect.

There may be different types of nodes for different types of tasks. The HPC cluster listed on this wiki has

  • a headnode or login node, where users log in
  • multiple data nodes
  • Cpu compute nodes
  • “fat” Cpu compute node with 2TB of memory
  • Cpu+Gpu compute nodes (on these nodes computations can be run both on CPU cores and on a Graphical Processing Unit cards)

All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space. The difference between personal computer and a cluster node is in quantity, quality and power of the components.

For more information on the list of used hardware for this cluster, see the EEMCS-HPC Hardware page.

This cluster is based on the Slurm scheduler 19.05.5 running on Ubuntu 20.04 LTS.

You can connect to one of the headnodes : hpc-head1.ewi.utwente.nl or hpc-head2.ewi.utwente.nl
See the connection info page on how to connect.

Do NOT login to the compute nodes, either directly or through ssh, ONLY on one of the head nodes !!!!

To monitor the jobs and progress you can use the EEMCS-HPC Slurm dashboard page or the available command line tools like squeue or scontrol.

To use specific resources please check the EEMCS-HPC features, resources and partitions page.

See the Slurm/HPC scheduler info page for more information.

Upcoming maintenance :

  • 24 April 2024
  • .. Sept/Oct 2024

During the maintenance day, the whole cluster will go offline.

For smaller experiments and interactive jobs, please try other resources like :

or external providers like :

  • Non members employees, students and guests are allowed as well, but will get a basic priority factor.

To get access, you need to have an AD account of the University of Twente. All students and employees have such an account and they can be arranged for external persons. To get your AD account enabled for these clusters, you need to contact one of the contact persons.

Access to the following partitions are limited to the funders during the first year of investment, these can be reached using their partitions.

The HPC/SLURM cluster contains multiple common partitions :

Partition name available to
main All (default)
dmb eemcs-dmb
ram eemcs-ram
bdsi bms-bdsi
mia eemcs-mia
am eemcs-(dmmp/macs/mast/mia/mms/sor/stat)
mia-pof eemcs-mia & tnw-pof
students eemcs-students

Check the EEMCS-HPC specifics, partition option on how to select the these.

The participating groups who have done investments in the HPC cluster, therefore they will have more priority than other groups not participating.
In order to gain more priority, your group can do an investment in the HPC cluster, depending on the kind of investment this will result in :

  • Sole usage of the purchased compute node(s) for the time span of approx. one year.
  • A higher priority factor related to the total amount of investment, retirement of hardware will reduce the priority factor (after roughly 8 years).
  • Participating higher Quality of Service.

This combination will guarantee more priority and calculation time on the cluster.

please consult the corresponding contact for this :

Admin Page

See the EEMCS-HPC Admin page for more information.

For staff, the username is probably your family name followed by your initials, for students its your student number starting with the “s”, for guest accounts this would be starting with the “x”.

DSI Computing Lab does not store your password and we are unable to reset your password. If you require password assistance, please visit the ICTS/LISA Servicedesk.

For the HPC/SLURM cluster, two mailing lists are created :

  • EEMCS-Hpc-Cluster-Users (all the users)
  • EEMCS-Hpc-Cluster-Managers (all the managers)

Access to DSI Computing Lab resources is provided via secure shell (SSH) login.

Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal window.

You can connect to one of the headnodes : hpc-head1.ewi.utwente.nl or hpc-head2.ewi.utwente.nl

See the connecting page for more information.

The cluster machines run on Ubuntu Server 20.04 LTS, some basic packages in the repositories have been installed. Additional software is available using module files.

See the EEMCS-HPC Software page for more information.

The following folders are available :

  • Network wide personal folder :
    • /home/<username> Home folder : This is where you store your code and project related materials, “small” amount of data is allowed within your home folder.
  • Network wide global folder :
    • /deepstore/datasets Dataset folder : This is the location for mainly static and/or large data(sets). New folders are available on request.
  • Local scratch folder :
    • /local/<username> or /local/<projectname> scratch folder: You are allowed to create a local folder to storage temporary data.
    • Use this space to store intermediate data, keep in mind this is temporary data and should be removed by you at the end of your job.

Quota

Quota is activated on the /home/<username> folder, this means we limit the amount of data in your personal folder.

  • below 1TB : This is fine, keep your data size below this threshold.
  • Between 1TB..2TB : You will get a warning if your folder reaches more than 1TB, this warning will be valid for a grace period of 4 weeks, after this writing will be blocked.
  • Over 2TB : Writing will be blocked, you will definitely have to remove data.

Batch Jobs

Slurm sbatch is used to submit a job script for later execution.
The script will typically contain the scheduler parameters, setup commands and the processing task(s) or (if required) multiple Slurm srun commands to launch parallel tasks.
See the Slurm sbatch and Slurm srun wiki page for more details.

Before submitting jobs please note the maximum number of jobs and resources related to your accounts Quality Of Service (QOS).
These numbers can be obtained from the QOS tab on the EEMCS-HPC Slurm dashboard page.

Interactive Jobs

It is possible to request for an interactive job, within this job you can execute small experiments. Use this only for a short time (max 1 hour).
For this you can use the additional Slurm sinteractive command.

The following commands are located in the software module monitor/node, you should load them on beforehand. Check the Monitoring Computenodes page for more information.

During the job

You can monitor your jobs using the

  • realtime cpu monitor : top-node <nodename>
  • realtime gpu monitor : nvtop-node <nodename>
  • snapshot gpu monitor : nvidia-smi-node <nodename>

After the job

When your job is finished you can check the :

  • content of your jobs logfile
  • jobs efficiency using : seff <jobnumber>