====== Tensorflow ====== ===== Installing Tensorflow ===== Select which python version you want to use : python 3.8 (system default) or different versions using the correct **[[eemcs-hpc:software#optional_software|Environment Module]]** file. Install tensorflow using pip3 pip3 install tensorflow or if you want to install older versions (down from 1.15 you have to select CPU/GPU version) pip3 install tensorflow==1.15 # CPU pip3 install tensorflow-gpu==1.15 # GPU ===== Required Environment Modules ===== You can load the nvidia cuda tool kits using the appropriate **[[eemcs-hpc:software#optional_software|Environment Module]]**, for example : module load nvidia/cuda-11.3 module load nvidia/cuda-11.3_cudnn-8.2 module load nvidia/cuda-11.3_tensorrt-8.0 //note: loading cuda-xx.y will automatically load CUPTI// \\ //note: nodes containing gpus will have the gpu drivers loaded by default.// ===== Submitting a Tensorflow Job ===== To run a job using **CPU only** support on the compute nodes you can use this sample sbatch file : #!/bin/bash #SBATCH -c 8 # number of cores, here are 8 requested. #SBATCH --mail-type=END,FAIL # email status changes # diagnostic information # display node name echo "nodename :" hostname # check if gpu are assigned, if not create empty list if [ -z "$CUDA_VISIBLE_DEVICES" ]; then export CUDA_VISIBLE_DEVICES="" fi # display which gpu's we are allowed to use echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'" # run tensorflow echo "Starting Tensorflow: " python3 mycode.py To run a job using **GPU** support on the compute nodes you can use this sample sbatch file : #!/bin/bash #SBATCH -c 8 # number of cores, here are 8 requested. #SBATCH --gres=gpu:1 # number of gpus, here is 1 requested. #SBATCH --mail-type=END,FAIL # email status changes # diagnostic information # display node name echo "nodename :" hostname # check if gpu are assigned, if not create empty list if [ -z "$CUDA_VISIBLE_DEVICES" ]; then export CUDA_VISIBLE_DEVICES="" fi # display which gpu's we are allowed to use echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'" # load nvidia cuda toolkit(s) module load nvidia/cuda-11.3 module load nvidia/cuda-11.3_cudnn-8.2 module load nvidia/cuda-11.3_tensorrt-8.0 # run tensorflow echo "Starting Tensorflow: " python3 mycode.py If you want to temporary disable gpu support in the **tensorflow-gpu.sbatch** file, you need to request zero (0) gpus : #SBATCH --gres=gpu:0 # number of gpus, here is 0 requested. ===== Solving runtime issues ===== If you have problems running tensorflow check the following in sequence first: * see what is in the output file(s) of your job ? * are the required modules loaded correctly ? * do you see errors like this (this is related to hardcoded cuda location within tensorflow) ? Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. Add the following line to your submit script before starting your python script: export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_HOME