EEMCS-Hadoop Cluster

The Hadoop cluster is currently offline !!!! We are in the progress of developing a new installation.

The DSI (Digital Society Institute, formerly CTIT) computing lab is an environment that contains also a Hadoop cluster.

  • This second cluster is a Hadoop/Yarn cluster scheduled by YARN.
  • 8 nodes with FDR Infiniband
  • 16 nodes with QDR Infiniband

The Hadoop cluster can be used for runnnig large scale computations, but because of the nature of hadoop it should not be used for benchmarking.

For more information on the cluster, see the hardware page.

You can connect to one of the following headnodes :

  • ctithead1.ewi.utwente.nl
  • ctithead2.ewi.utwente.nl

See the Hadoop/Yarn page for more information. To monitor the jobs and progress you can use the “”….

Upcoming maintenance :

  • t.b.d.

During the maintenance day, the whole cluster will go offline.

Members of the EEMCS-BSS, EEMCS-FMT,EEMCS-DMB, EEMCS-PS, EEMCS-RAM, EEMCS-SCS and BMS-BDSI groups are automatically granted access, as well as people with whom members of these groups cooperate.

To get access, you need to have an AD account of the University of Twente. All students and employees have such an account and they can be arranged for external persons. To get your AD account enabled for these clusters, you need to contact one of the contact persons.

For staff, the username is probably your family name followed by your initials, for students its your student number starting with the “s”, for guest accounts this would be starting with the “x”.

DSI Computing Lab does not store your password and we are unable to reset your password. If you require password assistance, please visit the ICTS/LISA Servicedesk.

A mailing list for the Hadoop cluster has been created on the UTwente list server

Access to Hadoop cluster is provided via secure shell (SSH) login.

Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal window.

See the connecting page for more information.

The Hadoop cluster machines run Ubuntu Server 18.04 LTS. Some basic packages in the repositories have been installed. Additional software is available in the */software* folder.

See the software page for more information.

The following folders are available :

  • Network wide personal folder :
    • Home folder : You can store small amount of data within your home folder (/home/username)
    • Hadoop Cluster : Data for the Hadoop cluster can be placed on the Hadoop Distributed File System (HDFS)

For a quick start of the Hadoop software see Hadoop Quick Start and More Hands on Experience for more information contact Jan Flokstra.

Also attached is “models.tar.gz”. This is an archive of a large set of models we benchmarked LTSmin with. Furthermore we have attached “analyse-experiments.php”. This script can be used to analyse std out and std err output from thousands of experiments in seconds. This script can be used as reference (it may or may not suit your needs). The main result of this script are CSV files with results of the experiments. Also some Latex code is generated to quickly include these CSV files in your Latex documents.