EEMCS-Hadoop Cluster
Introduction
The DSI (Digital Society Institute, formerly CTIT) computing lab is an environment that contains also a Hadoop cluster.
- This second cluster is a Hadoop/Yarn cluster scheduled by YARN.
Hardware Specifications AM Cluster
Node name | OS | Cores | Memory | CPU | purchase date |
---|---|---|---|---|---|
spark1-8 | Ubuntu 22.04 | 64 cores (HT 128) | 1 Tb | AMD EPYC 7713P 64-Core Processor | 28 June 2023 |
spark-nn | Ubuntu 22.04 | 16 cores (HT 32) | 64 Gb | Xeon CPU E5-2630 v3 @ 2.40GHz | 28 April 2016 |
spark-snn | Ubuntu 22.04 | 16 cores (HT 32) | 64 Gb | Xeon CPU E5-2630 v3 @ 2.40GHz | 28 April 2016 |
linux801 | Ubuntu 22.04 | 1 core | 4 Gb | Xeon Gold 5118 CPU @ 2.30GHz | n.v.t. VM |
The Hadoop cluster can be used for runnnig large scale computations, but because of the nature of hadoop it should not be used for benchmarking.
HDFS size is currently 924 Tb.
For more information on the cluster, see the hardware page.
Login Nodes
You can connect to one of the following headnodes :
- head nodes are removed. Will be renewed before the new course year.
Yarn Scheduler
See the Hadoop/Yarn page for more information. To monitor the jobs and progress you can use the “”….
Maintenance
Upcoming maintenance :
- t.b.d.
During the maintenance day, the whole cluster will go offline.
Access
Who has access?
Members of the EEMCS-BSS, EEMCS-FMT,EEMCS-DMB, EEMCS-PS, EEMCS-RAM, EEMCS-SCS and BMS-BDSI groups are automatically granted access, as well as people with whom members of these groups cooperate.
To get access, you need to have an AD account of the University of Twente. All students and employees have such an account and they can be arranged for external persons. To get your AD account enabled for these clusters, you need to contact one of the contact persons.
Contact persons.
- Jan Flokstra (EEMCS-DMB/HMI)
Credentials
Accounts
For staff, the username is probably your family name followed by your initials, for students its your student number starting with the “s”, for guest accounts this would be starting with the “x”.
DSI Computing Lab does not store your password and we are unable to reset your password. If you require password assistance, please visit the ICTS/LISA Servicedesk.
Mailing list
A mailing list for the Hadoop cluster has been created on the UTwente list server
Connecting to the cluster
Access to Hadoop cluster is provided via secure shell (SSH) login.
Most Unix-like operating systems (Mac OS X, Linux, etc) provide an ssh utility by default that can be accessed by typing the command ssh in a terminal window.
See the connecting page for more information.
Setting up
Software.
The Hadoop cluster machines run Ubuntu Server 18.04 LTS. Some basic packages in the repositories have been installed. Additional software is available in the */software* folder.
See the software page for more information.
Storage
The following folders are available :
- Network wide personal folder :
- Home folder : You can store small amount of data within your home folder (/home/username)
- Hadoop Cluster : Data for the Hadoop cluster can be placed on the Hadoop Distributed File System (HDFS)
Usage
Hadoop/YARN
For a quick start of the Hadoop software see Hadoop Quick Start and More Hands on Experience for more information contact Jan Flokstra.
Analysing experiments (not related to SLURM)
Also attached is “models.tar.gz”. This is an archive of a large set of models we benchmarked LTSmin with. Furthermore we have attached “analyse-experiments.php”. This script can be used to analyse std out and std err output from thousands of experiments in seconds. This script can be used as reference (it may or may not suit your needs). The main result of this script are CSV files with results of the experiments. Also some Latex code is generated to quickly include these CSV files in your Latex documents.