TensorFlow#

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

TensorFlow can be installed through a number of python package managers such as Conda or pip.

For use on Bede’s ppc64le nodes, the simplest method is to install TensorFlow using the Open-CE Conda distribution.

For the aarch64 nodes, using a NVIDIA provided NGC Tensorflow container is likely preferred.

Installing via Conda (Open-CE)#

With a working Conda installation (see Installing Miniconda) the following instructions can be used to create a Python 3.8 conda environment named tf-env with the latest Open-CE provided TensorFlow:

Note

TensorFlow installations via conda can be relatively large. Consider installing your miniconda (and therfore your conda environments) to the /nobackup file store.

# Create a new conda environment named tf-env within your conda installation
conda create -y --name tf-env python=3.8

# Activate the conda environment
conda activate tf-env

# Add the OSU Open-CE conda channel to the current environment config
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/current/

# Also use strict channel priority
conda config --env --set channel_priority strict

# Install the latest available version of Tensorflow
conda install -y tensorflow

In subsequent interactive sessions, and when submitting batch jobs which use TensorFlow, you will then need to re-activate the conda environment.

For example, to verify that TensorFlow is available and print the version:

# Activate the conda environment
conda activate tf-env

# Invoke python
python3 -c "import tensorflow;print(tensorflow.__version__)"

Note

The Open-CE distribution of TensorFlow does not include IBM technologies such as DDL or LMS, which were previously available via WMLCE. WMLCE is no longer supported.

Warning

Conda and pip builds of TensorFlow for aarch64 do not include CUDA support as of April 2024. For now, see Using NGC TensorFlow Containers or build from source.

Using NGC TensorFlow Containers#

Warning

NVIDIA do not provide ppc64le containers for TensorFlow through NGC. This method should only be used for aarch64 partitions.

NVIDIA provide docker containers with CUDA-enabled TensorFlow builds for x86_64 and aarch64 architectures through NGC.

The NGC Tensorflow containers have included Hopper support since 22.09.

For details of which TensorFlow version is provided by the each container release, see the NGC TensorFlow container release notes.

Apptainer can be used to convert and run docker containers, or to build an apptainer container based on a docker container. These can be built on the aarch64 nodes in Bede using Rootless Container Builds.

Note

TensorFlow containers can consume a large amount of disk space. Consider setting APPTAINER_CACHEDIR to an appropriate location in /nobackup, e.g. export APPTAINER_CACHEDIR=/nobackup/projects/${SLURM_JOB_ACCOUNT}/${USER}/apptainer-cache.

Note

The following apptainer commands should be executed from an aarch64 node only, i.e. on ghlogin, gh or ghtest.

Docker containers can be fetched and converted using apptainer pull, prior to using apptainer exec to execute code within the container.

# Pull and convert the docker container. This may take a while.
apptainer pull docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3
# Run a command in the container, i.e. showing the TensorFlow version
apptainer exec --nv docker://nvcr.io/nvidia/tensorflow:24.03-tf2-py3 python3 -c "import tensorflow; print(tensorflow.__version__);"

Alternatively, if you require more than just TensorFlow within the container you can create an apptainer definition file. E.g. for a container based on tensorflow:24.03-tf2-py3 which also installs HuggingFace Transformers 4.37.0, the following definition file could be used:

Bootstrap: docker
From: nvcr.io/nvidia/tensorflow:24.03-tf2-py3

%post
  # Install other python dependencies, e.g. hugging face transformers
  python3 -m pip install transformers==4.37.0

%test
  # Print the torch version, if CUDA is enabled and which architectures
  python3 -c "import tensorflow; print(tensorflow.__version__); print(tensorflow.config.list_physical_devices('GPU'));"
  # Print the TensorFlow transformers version, demonstrating it is available.
  python3 -c "import transformers;print(transformers.__version__);"

Assuming this is named tf-transformers.def, a corresponding apptainer image file name tf-transformers.sif can then be created via:

apptainer build --nv tf-transformers.sif tf-transformers.def

Commands within this container can then be executed using apptainer exec. I.e. to see the version of transformers installed within the container:

apptainer exec --nv tf-transformers.sif python3 -c "import transformers;print(transformers.__version__);"

Or in this case due to the %test segment of the container, run the test command.

apptainer test --nv tf-transformers.sif

Further Information#

For further information on TensorFlow features and usage, please refer to the TensorFlow Documentation.

TensorFlow

Contents

TensorFlow#

Installing via Conda (Open-CE)#

Using NGC TensorFlow Containers#

Further Information#