In order to upload data from you personal computer to the hadoop cluster you need to follow two steps. First you need to upload data to the head node and from the head not to the Hadoop cluster.
Example:
scp <file path on the personal machine> <username>@wegdam.ewi.utwente.nl:~
Example:
ssh <username>@wegdam.ewi.utwente.nl ls hdfs dfs -put <sourse address on headnode> hdfs dfs -ls
Attention! While you are logged in to the head node you can use normal unix commands (eg. ls, cd, rm, …). When you want to access files on the hadoop file system you have to use the prefix (hdfs dfs) before all commands.
You have to follow the reverse procedure of what mentioned above. First you need to get data from Hadoop cluster to the Head node and then from head node to your local machine.
Example : assuming that you are logged to the head node
hdfs dfs -get <path of the file on the Hadoop cluster> ls
Example:
scp <file name on the head node > <username@yourmachine IP:destination path>
Attention! Please later remove everything that you copied to the head node and Hadoop cluster in order to keep it clean!
Example: Assuming that you are logged in to the head node Removing files from head node:
rm filename
Removing files from Hadoop cluster
hdfs dfs -rm filename
/usr/lib/spark/bin/spark-shell --master yarn-client --num-executors 5 --driver-memory 4g --executor-memory 2g --executor-cores 1 --conf spark.ui.port=<a number higher that 4050>--queue <your username>
Authors: