===== More Hands on Experience =====
==== How to upload on Hadoop cluster ====
In order to upload data from you personal computer to the hadoop cluster you need to follow two steps. First you need to upload data to the head node and from the head not to the Hadoop cluster.
* step 1: uploading data to the head node:
Example:
scp @wegdam.ewi.utwente.nl:~
* step 2: Logging in to the head node and putting the data on Hadoop cluster
Example:
ssh @wegdam.ewi.utwente.nl
ls
hdfs dfs -put
hdfs dfs -ls
Attention! While you are logged in to the head node you can use normal unix commands (eg. ls, cd, rm, ...). When you want to access files on the hadoop file system you have to use the prefix (hdfs dfs) before all commands.
==== How to copy data from Hadoop cluster to your local machine ====
You have to follow the reverse procedure of what mentioned above. First you need to get data from Hadoop cluster to the Head node and then from head node to your local machine.
* step 1: getting data to the head node:
Example :
assuming that you are logged to the head node
hdfs dfs -get
ls
* step 2: Copying the file from head node to your local machine
Example:
scp
Attention! Please later remove everything that you copied to the head node and Hadoop cluster in order to keep it clean!
Example: Assuming that you are logged in to the head node
Removing files from head node:
rm filename
Removing files from Hadoop cluster
hdfs dfs -rm filename
==== Running Apache Spark ====
/usr/lib/spark/bin/spark-shell --master yarn-client --num-executors 5 --driver-memory 4g --executor-memory 2g --executor-cores 1 --conf spark.ui.port=--queue
Authors:
* Jair Santana, j.j.santanna@utwente.nl
* Mitra Baratchi, m.baratchi@utwente.nl