===== More Hands on Experience ===== ==== How to upload on Hadoop cluster ==== In order to upload data from you personal computer to the hadoop cluster you need to follow two steps. First you need to upload data to the head node and from the head not to the Hadoop cluster. * step 1: uploading data to the head node: Example: scp @wegdam.ewi.utwente.nl:~ * step 2: Logging in to the head node and putting the data on Hadoop cluster Example: ssh @wegdam.ewi.utwente.nl ls hdfs dfs -put hdfs dfs -ls Attention! While you are logged in to the head node you can use normal unix commands (eg. ls, cd, rm, ...). When you want to access files on the hadoop file system you have to use the prefix (hdfs dfs) before all commands. ==== How to copy data from Hadoop cluster to your local machine ==== You have to follow the reverse procedure of what mentioned above. First you need to get data from Hadoop cluster to the Head node and then from head node to your local machine. * step 1: getting data to the head node: Example : assuming that you are logged to the head node hdfs dfs -get ls * step 2: Copying the file from head node to your local machine Example: scp Attention! Please later remove everything that you copied to the head node and Hadoop cluster in order to keep it clean! Example: Assuming that you are logged in to the head node Removing files from head node: rm filename Removing files from Hadoop cluster hdfs dfs -rm filename ==== Running Apache Spark ==== /usr/lib/spark/bin/spark-shell --master yarn-client --num-executors 5 --driver-memory 4g --executor-memory 2g --executor-cores 1 --conf spark.ui.port=--queue Authors: * Jair Santana, j.j.santanna@utwente.nl * Mitra Baratchi, m.baratchi@utwente.nl