Apptainer
Apptainer (formerly Singularity) is a simple container platform enabling users to install and run software that would otherwise be unsupported by the host environment.
On Klone, the apptainer command is available on all compute nodes, but it is not available on the login node.
By default, the apptainer command will use the system version of Apptainer:
which apptainer # prints /usr/bin/apptainer
apptainer --version # apptainer version 1.2.4-1.el8 (as of 2023-12-05)To use a different version of Apptainer, load the appropriate Lmod module. For example, to use Apptainer version 1.1.5:
module load apptainer/1.1.5
which apptainer # prints /sw/apptainer/1.1.5/bin/apptainer
apptainer --version # apptainer version 1.1.5To see all available versions of Apptainer, run:
module -t spider apptainerThe default module is indicated by the tag (D) following the version. For example, apptainer/local (D) indicates that the default version is local. (The local version is the version installed on the system.)
Using Apptainer
Interactive session
Bind paths
Apptainer containers are designed to be portable, so by default, they do not have access to all the files on the host system. To make files available to the container, you must bind them to the container. This is done using the --bind option of the apptainer command or the APPTAINER_BINDPATH environment variable.
By default, apptainer will make several directories available within the container by binding them to the same path in the container so that /somefolder on the host is available as /somefolder within the container.
These directories include:
$HOME(your home directory, a.k.a.~)/tmp(temporary directory – unique to each node; contents are purged when the users’ last job running on the node completes)$PWD(the current working directory, i.e., the directory you are in when you runapptainer)
The Klone Apptainer installation is also configured to bind several other directories to the same path in the container, including:
/mmfs1(the main filesystem)/scr(the scratch filesystem, same as/tmp)
We recommend setting the following binds before running a container:
export APPTAINER_BINDPATH="/gscratch" # Make /gscratch available within the containerTo start an Apptainer container interactively, run the following:
apptainer shell <path_to_container>Pulling an Apptainer image from docker.io registry
apptainer pull docker://<image_name>[:<tag>] # Pulls the image from docker registryPractical examples
Using Apptainer to run a Python script
Let’s try running a Python script using Apptainer. The release notes for Python version 3.11 say that it is “between 10-60% faster than Python 3.10”. Is this true? Let’s find out! We’ll write a simple Python script to test this and use Apptainer to run it on Python 3.10 and Python 3.11.
speedtest.py
#!/usr/bin/env python3
import os, sys, timeit
# Get the values of the environment variables M and N, or use default values:
n = int(os.getenv("N", default=1000)) # How many numbers to join
m = int(os.getenv("M", default=100)) # How many times to run the test
# Function to test:
def join_nums():
return "-".join([str(i) for i in range(n)])
# Print details about the test to stderr:
print(
f"Running join_nums() {n}*{m} times on Python v{sys.version_info.major}.{sys.version_info.minor}",
file=sys.stderr,
)
# Run the test:
result = timeit.timeit(join_nums, number=m)
# Print the result:
print(result)- 1
-
The
os.getenvfunction retrieves the value of an environment variable or returns a default value if the variable is not set. - 2
-
Here, we use the
fileparameter toprint(), which makes it write to the standard error stream (stderr) instead of the default standard output stream (stdout). This is useful because we want to print the result of the test tostdoutso that we can save the output by redirecting it to a file, but we also want to print some information about the test tostderrso that it doesn’t get mixed up with the result. - 3
-
timeit.timeit()runs a function multiple times and returns the average time it took to run the function. - 4
-
The
printfunction prints to the standard output stream (stdout) by default. We can redirect this to a file in bash using the>operator.
You can use the nano command to create a new file and paste the contents of the file into the terminal. To do this, run nano <filename> in the terminal, paste the contents of the file into the terminal, and press Ctrl+X to exit. You will be asked if you want to save the file. Press Y to save the file and Enter to confirm the filename.
Now let’s run the script using Python 3.10 and Python 3.11. For the image, we will use the python:3.10-slim and python:3.11-slim images from the official Python images.
The python:3.10-slim and python:3.11-slim images are tagged with the version of Python they contain. This means that if you pull the python:3.10-slim image today, it will always contain Python 3.10, even if Python 3.11 is released tomorrow.
The python:*-slim images are designed to contain only the minimal set of packages required to run Python and are much smaller than the standard Python images (~45 MiB vs ~350 MiB).
We’ll use the apptainer exec command to run the script inside an Apptainer container:
apptainer exec docker://python:3.10-slim python3 speedtest.py- 1
-
The
apptainer execcommand runs a command inside an Apptainer container. Thedocker://python:3.10-slimargument tells Apptainer to use thepython:3.10-slimimage from the Docker registry. Thepython3 ./speedtest.pyargument tells Apptainer to run thepython3command inside the container and provide it with the argumentspeedtest.py.
You should see something like:
Running join_nums() 1000*100 times on Python v3.10
0.003154174002702348- 1
-
This is printed to stderr because we used
print(..., file=sys.stderr)in the script. - 2
- This is printed to stdout.
It probably didn’t take very long to run the script. Let’s try running it again with larger values of M and N. We can set the values of M and N by setting them as environment variables in the bash command prompt before running the script:
export M=1000 N=100_000
apptainer exec docker://python:3.10-slim python3 speedtest.py- 1
-
The
exportcommand sets the values of theMandNenvironment variables and makes them available to other programs likeapptainer. We’re settingNto100_000instead of100000because underscores can be used in Python to make large numbers easier to read. This is a feature of Python, not the shell (which interprets100_000as just a string and not a number).
It should take a bit longer to run this time.
Now, let’s try running the script using Python 3.11:
apptainer exec docker://python:3.11-slim python3 speedtest.pyWe don’t need to set the values of M and N again because they are still set from the previous command via export (unless you closed your terminal window or logged out).
It probably took a bit less time to run the script this time.
Is Python 3.11 really faster than Python 3.10? Let’s find out by redirecting the output of the script to a file:
apptainer exec docker://python:3.10-slim python3 speedtest.py > py3.10.txt
apptainer exec docker://python:3.11-slim python3 speedtest.py > py3.11.txt
cat py3.10.txt py3.11.txt # show the results- 1
-
In bash, the
>operator redirects the output of a command to a file. If the file already exists, it will be overwritten. If you want to append to an existing file instead, use the>>operator. These operators can be used with any command, not justapptainer. For example,ls > my_files.txtwill write the output oflsto the filemy_files.txt. - 2
-
The
catcommand prints the contents of a file to> stdout. If you want to print the contents of multiple files, you can list them all as arguments tocat.
We see the results, but they’re not very easy to interpret. Let’s pipe the output to Python to calculate the speedup:
cat py3.10.txt py3.11.txt | apptainer exec docker://python:3.11-slim python3 -c 'print(float(input())/float(input()))'- 1
-
The
|operator pipes the output of one command to the input of another. In this case, we are piping the output ofcat py3.10.txt py3.11.txtto the input ofapptainer exec docker://python:3.11-slim python3 -c 'print(float(input())/float(input()))'. The-coption of thepython3command tells Python to run the code provided as an argument. The codeprint(float(input())/float(input()))reads two lines of input from standard input, converts them to floating point numbers, divides the first by the second, and prints the result.
In my case, Python 3.11 was about 1.3 times faster than Python 3.10. Not bad!
Here’s a demo of the above commands. Note that I used M=200 and N=50_000 instead of M=1000 and N=100_000 because it takes a long time to run the script with the larger values of M and N.
Writing a definition file to build a custom Apptainer image
What if we wanted to add a command-line interface to make it possible to run the script with different values of M and N without having to set them as environment variables? A user might want to set the values of M and N as command-line arguments. Maybe they would also like to be able to specify the output file instead of redirecting the output to a file. And how about a progress bar? Users love progress bars!
This sounds like a tall order, but it’s quite easy to do in Python using the click package for Python. Here’s the new script:
speedtest-cli.py
#!/usr/bin/env python3
import os, sys, timeit
import click
# Set up the context for the command line interface and add the options:
@click.command()
@click.option(
"-n", # The name of the option
envvar="N", # Use the environment variable `N` if it exists
default=1000, # Default value if `N` is not set
help="How many numbers to join", # Help text for the `-n` option
type=int, # Convert the value to an integer
)
@click.option(
"-m",
envvar="M",
default=100,
help="How many times to run the test",
type=int,
)
@click.option(
"--output",
default="/dev/stdout", # Write the result to stdout by default
help="Path to the output file",
type=click.Path(writable=True, dir_okay=False),
)
def speedtest(m, n, output):
"""Run a speed test.""" # Help text for the command
def join_nums(): # The function to test
return "-".join([str(i) for i in range(n)])
print(f"Running join_nums() {n}*{m} times on Python v{sys.version_info.major}.{sys.version_info.minor}", file=sys.stderr)
bar = click.progressbar( # Create a progress bar
length=n * m,
update_min_steps=n, # Update the progress bar every `n` steps
file=sys.stderr, # Print the progress bar to stderr
)
def join_nums_progress(): # Wrap the function to add the progress bar
result = join_nums()
bar.update(n) # Increment the progress bar by `n` steps
return result
result = timeit.timeit(join_nums_progress, number=m) # Run the test
bar.render_finish() # Stop rendering the progress bar
with open(output, "w") as f: # Open the output file for writing
print(result, file=f) # Write the result to the output file
if output != "/dev/stdout": # Show the output path if it's not stdout
print(f"Result written to {output}", file=sys.stderr)
if __name__ == "__main__": # Run the script if it's executed directly
speedtest()- 1
-
We set the default value of
--outputto/dev/stdoutso that the script will print the result to stdout by default. - 2
-
clickprovides aPathtype that can be used to validate paths. Thewritable=Trueoption tellsclickthat the path must be writable. Thedir_okay=Falseoption tellsclickthat the path must not be a directory. - 3
-
The
@click.command()decorator tellsclickthat the first function definition that follows defines a command. The@click.option()decorators add options to the command-line interface.clickpasses the values of these options to the function using the names of the options (lowercased and with-replaced by_). - 4
-
The
__name__ == "__main__"check ensures that the script is only run if it is executed directly and not if it is imported as a module. This is useful if you want to use the script as a module in another script. We don’t need to do this here, but it’s a good habit to get into.
Now, let’s try running the script with apptainer exec:
apptainer exec docker://python:3.11-slim python3 speedtest-cli.pyYou probably got a ModuleNotFoundError because the click package is not installed in the python:3.11-slim image. If you know some Python, you might think, “I could probably fix this if I install the click module via pip.” And you would be right – if you were running the script in the version of Python that’s normally installed on your computer. But we’re running the script inside an Apptainer container, so we need to install the click module inside the container. Furthermore, we’re using two different versions of Python, so we need to install the click module in the containers for both versions.
We’ll solve this by writing an Apptainer definition file to build a custom Apptainer image that contains the click module. Here’s the definition file:
speedtest.def
Bootstrap: docker # Where to get the base image from.
From: python:{{ PY_VERSION }}-slim # Which container to use as a base image.
%arguments
# The version of Python to use:
PY_VERSION=3.10
%files
speedtest-cli.py /opt/local/bin/speedtest # Copy the script to the container.
%post
# Create a virtual environment in /opt/venv to install our dependencies:
/usr/local/bin/python -m venv /opt/venv
# Install `click` and don't cache the downloaded files:
/opt/venv/bin/pip install --no-cache-dir click
# Print a message to stderr to let the user know that the installation is done:
echo "$(/opt/venv/bin/python3 --version): Done installing dependencies." >&2
# Make the `speedtest` command executable:
chmod +x /opt/local/bin/speedtest
%environment
export PATH="/opt/local/bin:$PATH" # Add the directory with the `speedtest` command to the PATH.
export PATH="/opt/venv/bin:$PATH" # Add the virtual environment to the PATH.
%runscript
# Run the Python script with the arguments passed to the container:
speedtest "$@"
%test
# Run the speedtest command to check if it works:
speedtest -m 2000 -n 1000- 1
-
Each definition file must start with a
Bootstrapline that tells Apptainer where to get the base image from. In this case, we’re using thedockerbootstrap, which means that we’re getting the base image from the Docker registry. - 2
-
We’re using the
python:{{ PY_VERSION }}image as the base image. The{ PY_VERSION }part refers to a template variable and will be replaced with the value of thePY_VERSIONbuild argument when we build the image. This means we can use the same definition file to build images for different versions of Python. - 3
-
The definition file is split into sections. The
%argumentssection defines the build arguments for the image. We’re using thePY_VERSIONargument to specify the version of Python to use, and we’re setting the default value to3.10. - 4
-
The
%filessection defines the files to copy inside the container. We’re copying thespeedtest-cli.pyscript to/opt/local/bin/speedtestin the container. /opt/local/bin is a good place to put scripts that you want to be able to run from anywhere in the container, but you should add the directory to thePATHenvironment variable if you want to be able to run the script without specifying the full path to the script. We’re removing the.pyextension from the script name because we want to be able to run the script by typingspeedtestinstead ofspeedtest-cli.py. - 5
-
The
%postsection defines the commands to run inside the container after the base image has been downloaded. This is where you should install any dependencies that your script needs. - 6
-
We’re using the
python -m venvcommand to create a virtual environment in/opt/venv, which is where we will install theclickmodule. - 7
-
We run
pipfrom the virtual environment to install theclickmodule. We use the--no-cache-diroption to tellpipnot to cache the downloaded files. This is useful because we don’t need the downloaded files after we’ve installed theclickmodule – otherwise, they would just take up space in the built image. - 8
-
We’re using the
echocommand to print a message to stderr to let the user know that the installation is done. We’re using the>&2operator to redirect the output ofechoto stderr instead of stdout. The$(...)syntax runs the commands within and inserts the output into a string. In this case, we’re using it to insert the output ofpython3 --versioninto the string we’re about to print withecho. This helps the user know which version of Python the built image will contain. - 9
-
We’re using the
chmodcommand to make thespeedtestcommand executable – otherwise, we wouldn’t be able to run it without passing it to thepython3command. - 10
-
The
%environmentsection defines the environment variables to set in the container. Note that the environment variables set in the%environmentsection are only set when the container is run. They are not set when the image is built, so they are not available in the%postsection, even if you move the%environmentsection above the%postsection. - 11
-
We’re prepending
/opt/local/binto the existing value of the PATH environment variable so that we can run thespeedtestcommand from anywhere in the container. ThePATHenvironment variable is used by the shell to find commands. If you want to be able to run a command without specifying the full path to the command, you need to add the directory containing the command to thePATHenvironment variable, separated by a colon (:). - 12
-
Adding the
/opt/venv/bindirectory to thePATHenvironment variable makes it possible to run thepythoncommand from the virtual environment we created in the%postsection. This is necessary because we installed theclickmodule in the virtual environment, so we need to run thepythoncommand from the virtual environment to be able to import theclickmodule. - 13
-
The
%runscriptsection defines the command to run when the container is run. In this case, we’re running thespeedtestcommand we installed in the%postsection. - 14
-
We’re using the
$@special parameter to pass all the arguments passed to the container to thespeedtestcommand – for example, if you runapptainer exec speedtest.sif --output output.txt, thespeedtestcommand will be run with the arguments--output output.txt. - 15
-
The
%testsection defines the command to run when the image is built. In this case, we’re running thespeedtestcommand to make sure that it works.
Now we can build the image using the apptainer build command. The first argument is the name of the image to build. The second argument is the path to the definition file:
apptainer build speedtest-py3.10.sif speedtest.defIt might take a little while to build the image. If everything goes according to plan, you should see the output of a test run at the end of the build process when the %test section is run.
When it’s done, the image will be saved as speedtest-py3.10.sif in the current directory. This image encapsulates the script and all its dependencies, so we can run it on any system that has Apptainer installed without having to worry about whether the system has a specific version of Python or the click module installed. Furthermore, because we defined a %runscript section in the definition file, we can run the script without having to specify the python3 command.
To run the containerized script, we can run the apptainer run command, which is similar to apptainer exec but launches the commands in the %runscript section of the definition file instead of the command specified as an argument:
apptainer run speedtest-py3.10.sifYou should see a progress bar and the result of the test.
Because the container image is marked as an executable, we can also run it directly without having to specify the apptainer command (although it will still run using Apptainer):
./speedtest-py3.10.sifTry it with some different values of M and N:
apptainer run speedtest-py3.10.sif -m 30_000 -n 1000Now try specifying a different output file:
apptainer run speedtest-py3.10.sif -- -m 30_000 -n 1000 --output py3.10.txt
cat py3.10.txt # Show the resultsHow do we build the image for Python 3.11? We could copy the definition file and change PY_VERSION to 3.11, but that would be a lot of work. Instead, we can use the --build-arg option of the apptainer build command to pass the value of PY_VERSION as a build argument to the definition file:
apptainer build --build-arg PY_VERSION=3.11 speedtest-py3.11.sif speedtest.defNow we can run the script using the new image:
apptainer run speedtest-py3.11.sif -m 30_000 -n 1000We can still use the environment variables M and N to set the values of m and n:
export M=30_000 N=1000
apptainer run speedtest-py3.11.sifThis makes it easier to run both containers with the same values of M and N without having to specify them each time.
If we wanted to, we could even use OUTPUT_FILE to specify the output file instead of using the --output option because we let click know that OUTPUT_FILE is an alternative for the --output argument if there is no --output option:
export OUTPUT_FILE=py3.11-2.txt
apptainer run speedtest-py3.11.sif
unset OUTPUT_FILE # Unset the OUTPUT_FILE environment variable so that it doesn't affect the next commandBecause we specified the %runscript section in the definition file, we can also execute the script directly without having to specify the apptainer command:
./speedtest-py3.11.sif -m 30_000 -n 1000Let’s recreate the comparison we did earlier:
export M=1000 N=100_000
apptainer run speedtest-py3.10.sif --output py3.10.txt
apptainer run speedtest-py3.11.sif --output py3.11.txt
cat py3.10.txt py3.11.txt | apptainer exec speedtest-py3.10.sif python3 -c 'print(float(input())/float(input()))'- 1
-
We’re using
apptainer execinstead ofapptainer runbecause we want to run thepython3command instead of the command specified in the definition file.
Here’s a demo of the above commands:
Clearing the Apptainer cache
Apptainer caches all the images it downloads in a cache directory to avoid downloading them again. The cache can get quite large, so it’s a good idea to clear it from time to time.
You can see the size of the cache directory by running:
apptainer cache listTo clear the cache, run:
apptainer cache cleanThat should free up some space.
Changing the Apptainer cache directory
By default, Apptainer stores the cache in ~/.apptainer/cache. This can be a problem if you have a small home directory (e.g., if you are using the default 10 GB quota on Klone). You can change the cache directory by setting the APPTAINER_CACHE environment variable. For example, to set the cache directory to /tmp/<your-username>/apptainer-cache, you can use the $USER environment variable:
export APPTAINER_CACHE="/tmp/$USER/apptainer-cache"Apptainer will create the cache directory if it does not already exist.
You’ll need to set the APPTAINER_CACHE environment variable every time you want to use Apptainer, so it’s a good idea to add it to your ~/.bashrc file so that it is always set when you log in.