In order to upload data from you personal computer to the hadoop cluster you need to follow two steps. First you need to upload data to the head node and from the head not to the Hadoop cluster.

  • step 1: uploading data to the head node:

Example:

scp <file path on the personal machine>  <username>@wegdam.ewi.utwente.nl:~
  • step 2: Logging in to the head node and putting the data on Hadoop cluster

Example:

ssh <username>@wegdam.ewi.utwente.nl
ls
hdfs dfs -put <sourse address on headnode> 
hdfs dfs -ls

Attention! While you are logged in to the head node you can use normal unix commands (eg. ls, cd, rm, …). When you want to access files on the hadoop file system you have to use the prefix (hdfs dfs) before all commands.

You have to follow the reverse procedure of what mentioned above. First you need to get data from Hadoop cluster to the Head node and then from head node to your local machine.

  • step 1: getting data to the head node:

Example : assuming that you are logged to the head node

hdfs dfs  -get <path of the file on the Hadoop cluster>
ls
  • step 2: Copying the file from head node to your local machine

Example:

scp <file name on the head node > <username@yourmachine IP:destination path>

Attention! Please later remove everything that you copied to the head node and Hadoop cluster in order to keep it clean!

Example: Assuming that you are logged in to the head node Removing files from head node:

rm filename

Removing files from Hadoop cluster

hdfs dfs -rm filename
/usr/lib/spark/bin/spark-shell --master yarn-client --num-executors 5 --driver-memory 4g   --executor-memory 2g --executor-cores 1 --conf spark.ui.port=<a number higher that 4050>--queue <your username>

Authors:

  • Jair Santana, j.j.santanna@utwente.nl
  • Mitra Baratchi, m.baratchi@utwente.nl