Table of Contents

EEMCS-Hadoop Cluster

The Hadoop cluster is currently offline !!!! We are in the progress of developing a new installation.

Introduction

The DSI (Digital Society Institute, formerly CTIT) computing lab is an environment that contains also a Hadoop cluster.

The Hadoop cluster can be used for runnnig large scale computations, but because of the nature of hadoop it should not be used for benchmarking.

For more information on the cluster, see the hardware page.

Login Nodes

You can connect to one of the following headnodes :

Yarn Scheduler

See the Hadoop/Yarn page for more information. To monitor the jobs and progress you can use the “”….

Maintenance

Upcoming maintenance :

During the maintenance day, the whole cluster will go offline.

Access

Who has access?

Members of the EEMCS-BSS, EEMCS-FMT,EEMCS-DMB, EEMCS-PS, EEMCS-RAM, EEMCS-SCS and BMS-BDSI groups are automatically granted access, as well as people with whom members of these groups cooperate.

To get access, you need to have an AD account of the University of Twente. All students and employees have such an account and they can be arranged for external persons. To get your AD account enabled for these clusters, you need to contact one of the contact persons.

Contact persons.

Credentials

Accounts

For staff, the username is probably your family name followed by your initials, for students its your student number starting with the “s”, for guest accounts this would be starting with the “x”.

DSI Computing Lab does not store your password and we are unable to reset your password. If you require password assistance, please visit the ICTS/LISA Servicedesk.

Mailing list

A mailing list for the Hadoop cluster has been created on the UTwente list server

Connecting to the cluster

Access to Hadoop cluster is provided via secure shell (SSH) login.

Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal window.

See the connecting page for more information.

Setting up

Software.

The Hadoop cluster machines run Ubuntu Server 18.04 LTS. Some basic packages in the repositories have been installed. Additional software is available in the */software* folder.

See the software page for more information.

Storage

The following folders are available :

Usage

Hadoop/YARN

For a quick start of the Hadoop software see Hadoop Quick Start and More Hands on Experience for more information contact Jan Flokstra.

Also attached is “models.tar.gz”. This is an archive of a large set of models we benchmarked LTSmin with. Furthermore we have attached “analyse-experiments.php”. This script can be used to analyse std out and std err output from thousands of experiments in seconds. This script can be used as reference (it may or may not suit your needs). The main result of this script are CSV files with results of the experiments. Also some Latex code is generated to quickly include these CSV files in your Latex documents.