Tensorflow

Select which python version you want to use : python 3.8 (system default) or different versions using the correct Environment Module file.

Install tensorflow using pip3

pip3 install tensorflow

or if you want to install older versions (down from 1.15 you have to select CPU/GPU version)

pip3 install tensorflow==1.15      # CPU
pip3 install tensorflow-gpu==1.15  # GPU

You can load the nvidia cuda tool kits using the appropriate Environment Module, for example :

module load nvidia/cuda-11.3
module load nvidia/cuda-11.3_cudnn-8.2
module load nvidia/cuda-11.3_tensorrt-8.0

note: loading cuda-xx.y will automatically load CUPTI
note: nodes containing gpus will have the gpu drivers loaded by default.

To run a job using CPU only support on the compute nodes you can use this sample sbatch file :

tensorflow-cpu.sbatch

#!/bin/bash
#SBATCH -c 8                            # number of cores, here are 8 requested.
#SBATCH --mail-type=END,FAIL            # email status changes
 
# diagnostic information
# display node name
echo "nodename :"
hostname
 
# check if gpu are assigned, if not create empty list
if [ -z "$CUDA_VISIBLE_DEVICES" ]; then
  export CUDA_VISIBLE_DEVICES=""
fi
# display which gpu's we are allowed to use
echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'"
 
# run tensorflow
echo "Starting Tensorflow: "
python3 mycode.py

To run a job using GPU support on the compute nodes you can use this sample sbatch file :

tensorflow-gpu.sbatch

#!/bin/bash
#SBATCH -c 8                            # number of cores, here are 8 requested.
#SBATCH --gres=gpu:1                    # number of gpus, here is 1 requested.
#SBATCH --mail-type=END,FAIL            # email status changes
 
# diagnostic information
# display node name
echo "nodename :"
hostname
 
# check if gpu are assigned, if not create empty list
if [ -z "$CUDA_VISIBLE_DEVICES" ]; then
  export CUDA_VISIBLE_DEVICES=""
fi
# display which gpu's we are allowed to use
echo "CUDA_VISIBLE_DEVICES = '"$CUDA_VISIBLE_DEVICES"'"
 
# load nvidia cuda toolkit(s)
module load nvidia/cuda-11.3
module load nvidia/cuda-11.3_cudnn-8.2
module load nvidia/cuda-11.3_tensorrt-8.0
 
# run tensorflow
echo "Starting Tensorflow: "
python3 mycode.py

If you want to temporary disable gpu support in the tensorflow-gpu.sbatch file, you need to request zero (0) gpus :

#SBATCH --gres=gpu:0                    # number of gpus, here is 0 requested.

If you have problems running tensorflow check the following in sequence first:

see what is in the output file(s) of your job ?
are the required modules loaded correctly ?
do you see errors like this (this is related to hardcoded cuda location within tensorflow) ?

Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.

Add the following line to your submit script before starting your python script:

export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_HOME

Tensorflow

Installing Tensorflow

Required Environment Modules

Submitting a Tensorflow Job

Solving runtime issues