====== Tensorflow ======

===== Installing Tensorflow =====

Select which python version you want to use : python 3.8 (system default) or different versions using the correct **[[eemcs-hpc:software#optional_software|Environment Module]]** file.

Install tensorflow using pip3
<code block>
pip3 install tensorflow
</code>
or if you want to install older versions (down from 1.15 you have to select CPU/GPU version)
<code block>
pip3 install tensorflow==1.15      # CPU
pip3 install tensorflow-gpu==1.15  # GPU
</code>

===== Required Environment Modules =====

You can load the nvidia cuda tool kits using the appropriate **[[eemcs-hpc:software#optional_software|Environment Module]]**, for example :
<code block>
module load nvidia/cuda-11.3
module load nvidia/cuda-11.3_cudnn-8.2
module load nvidia/cuda-11.3_tensorrt-8.0
</code>
//note: loading cuda-xx.y will automatically load CUPTI// \\
//note: nodes containing gpus will have the gpu drivers loaded by default.//

===== Submitting a Tensorflow Job =====

To run a job using **CPU only** support on the compute nodes you can use this sample sbatch file :
<file bash tensorflow-cpu.sbatch>
#!/bin/bash
#SBATCH -c 8                            # number of cores, here are 8 requested.
#SBATCH --mail-type=END,FAIL            # email status changes

# diagnostic information
# display node name
echo "nodename :"
hostname

# check if gpu are assigned, if not create empty list
if [ -z "$CUDA_VISIBLE_DEVICES" ]; then
  export CUDA_VISIBLE_DEVICES=""
fi
# display which gpu's we are allowed to use
echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'"

# run tensorflow
echo "Starting Tensorflow: "
python3 mycode.py
</file>

To run a job using **GPU** support on the compute nodes you can use this sample sbatch file :
<file bash tensorflow-gpu.sbatch>
#!/bin/bash
#SBATCH -c 8                            # number of cores, here are 8 requested.
#SBATCH --gres=gpu:1                    # number of gpus, here is 1 requested.
#SBATCH --mail-type=END,FAIL            # email status changes

# diagnostic information
# display node name
echo "nodename :"
hostname

# check if gpu are assigned, if not create empty list
if [ -z "$CUDA_VISIBLE_DEVICES" ]; then
  export CUDA_VISIBLE_DEVICES=""
fi
# display which gpu's we are allowed to use
echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'"

# load nvidia cuda toolkit(s)
module load nvidia/cuda-11.3
module load nvidia/cuda-11.3_cudnn-8.2
module load nvidia/cuda-11.3_tensorrt-8.0

# run tensorflow
echo "Starting Tensorflow: "
python3 mycode.py
</file>

If you want to temporary disable gpu support in the **tensorflow-gpu.sbatch** file, you need to request zero (0) gpus :
<code block>
#SBATCH --gres=gpu:0                    # number of gpus, here is 0 requested.
</code>

===== Solving runtime issues  =====
If you have problems running tensorflow check the following in sequence first:
  * see what is in the output file(s) of your job ?
  * are the required modules loaded correctly ?
  * do you see errors like this (this is related to hardcoded cuda location within tensorflow) ?
<code block>
Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
</code>
Add the following line to your submit script before starting your python script:
<code bash>
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_HOME
</code>