Abel ROS Tutorial
From Robin
Abel is the super computer cluster at UiO. It gives user access to large amounts of hardware for experiments and supports many different software libraries out of the box. However, ROS is mainly supported on Ubuntu and as such is not supported on Abel. This document will describe the steps necessary to run ROS on Abel.
Contents |
Containers
The underlying technology that enables us to run ROS on Abel is Linux Containers. Abel uses Singularity which is a container system optimized for use on super computers. The basics behind containers is that they contain our running process while exposing a limited file system. This enables the container to mimic any other Linux distribution allowing us to run Ubuntu on Abel.
Singularity
Singularity is the container system on Abel. As of this writing these versions are supported (read more about Abel modules)
$ module avail singularity --------------------------------------------------------- /cluster/etc/modulefiles --------------------------------------------------------- singularity/2.3.1 singularity/2.3.2(default) singularity/2.4
We are mainly interested in 2.4
which has support for network isolation which we need for ROS.
Creating a container
The basis for most Singularity containers is the Singularity
receipe[1]. This file describes how the container should be built and from what basis.
This document will not detail all the different possibilities, we will instead illustrate the most used options. Below is a condensed version of the Singularity script used to create custom containers.
All recipe files must have the following header
Bootstrap: localimage From: /path/to/other/image
The header describes the basis of the image. For most of our purposes we will build from other local images, but we could also use Docker or one of the other methods.
The next section usually needed is the %setup
section. When building ROS images this is where you would create a workspace and copy your ROS code into this workspace.
%setup # Since this is ran outside the container we need to know where the container is built, the $SINGULARITY_ROOTFS is the root of the file system mkdir -p $SINGULARITY_ROOTFS/workspace/src cp -r /path/to/local/code $SINGULARITY_ROOTFS/workspace/src/
The %setup
section is run outside of the Singularity container and has full access to the file system on the machine it is running on.
The next section needed is %post
. This section is executed inside the container and this is the time to build our workspace.
%post # Source the ros envirnoment to get access to catkin (NOTE the .sh extension) . /opt/ros/kinetic/setup.sh # Change directory to our workspace (NOTE since this is inside the container we don't need any special variables) cd /workspace/src # Create a new workspace catkin_init_wokspace # Move back up to workspace cd /workspace # Build workspace catkin_make -j1 # (Might be needed) catkin_make install
The final section that is needed before running on Abel is %runscript
. This section describes what should be done when a user executes singularity run
on the image. Usually this is where we would run our nodes to execute the experiment.
%runscript # Source workspace . /workspace/devel/setup.sh # Run nodes roslaunch best_experiment_ever best_launch_file.launch
Necessary configuration for Abel
To run our container on Abel we need to ensure that the bind points
that Abel expects
us to support is there. To support this add the following to in your %post
section.
%post mkdir -p /cluster /projects /work /usit
These folders will be used to bind the same folders into our container. This allows us to access home areas (e.g. /usit/abel/u1/jorgehn
) and work area (i.e. $SCRATCH
) among other things.
In addition, to support full network isolation we need to add the following to our %environment
%environment # Tell ROS to use `localhost` instead of actual machine address (which is not accessible with network isolation) export ROS_HOSTNAME=localhost export ROS_MASTER_URI=http://localhost:11311
This ensures that ROS uses localhost
as address instead of the actual machine address which will not be accessible when using network isolation. On Abel we will use network isolation to be able to
run several tasks on the same node without the tasks communicating with each other.
The quick way
We have created a container with ROS Kinetic and Gazebo 7.8.1
that should be enough for most ROS experiments. You can find this in the shared robin area /robin/jorgehn/Patched Gazebo
.
This container can be used as a basis for a Boostrap: localimage
and it exposes everything necessary to run on able and it has some conveniences (like sourcing setup.sh
automatically).
To use this simply download the image and have this as your header
Boostrap: localimage From: /path/to/ros_gazebo.simg
And from there create a %setup
section to copy in your custom node, a %post
to compile and a final %runscript
to run your experiments. The workspace is located
in /workspace/src
and inside the container is exposed as $ROS_WORKSPACE
.
SLURM
SLURM is the scheduling software on Abel that is responsible for running or experiments on different machines (called nodes in SLURM). We will, again, not document everything that SLURM is capable of, but instead focus on what is needed to run Singularity images. It is recommended to read about the different command line tools that exist for SLURM for a better understanding of how to use the system [2].
Configuring SLURM
To illustrate how to use Singularity and SLURM we have reproduced to scripts below that are can be used as inspiration. The first is our submit
script. This script is responsible for
configuring SLURM for our experiment, requesting resources and loading modules.
#!/bin/bash # This script is meant to be submitted with `sbatch` ### Configuration for queue system ### # Account configuration #SBATCH --account=uio #SBATCH --job-name=map_gait_expr_v2 # Node configuration #SBATCH --ntasks=5 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=1024 #SBATCH --time=0:5:0 # Email configuration #SBATCH --mail-type=BEGIN,END #SBATCH --mail-user=jorgehn@student.matnat.uio.no ### Cluster setup ### # Setup environment on cluster source /cluster/bin/jobsetup # Clear inherited modules module purge # Load specific version of Singularity module load singularity/2.4 # If anything fails we exit immediately set -o errexit ### Experiment configuration ### # Create result folder with date: START_DATE=`date +%Y-%m-%d+%H-%M` EXPERIMENT_FOLDER="$SCRATCH/$START_DATE" ### Initialize experiment ### # Create result folder mkdir -p $EXPERIMENT_FOLDER # Mark result folder for extraction cleanup "cp -r $EXPERIMENT_FOLDER $JOB_FOLDER" echo "Storing experiment results in '$EXPERIMENT_FOLDER'" ### Run experiment script ### srun --output=$EXPERIMENT_FOLDER/job-%2t.out singularity run -n singularity_image.simg
The most important part from the above script is the SLURM configuration. The account information is required to tell SLURM who to bill for the used resources. For most people this will be
uio
, but if you are part of a project with extra resources on Abel this should be changed. The name is not required, but can be handy to easier be able to identify the job.
# Account configuration #SBATCH --account=uio #SBATCH --job-name=map_gait_expr_v2
The node configuration below tells SLURM that we are requesting 5 task (meaning srun
will be run 5x times). We also request 2 CPUs per task. The memory request is required
and must be estimated (ROS + Gazebo requires around 1~2GB of memory). The last part is the most important and that is the time you expect the experiment to run for (in HH:MM:SS format).
SLURM uses the time to estimate how and when it should schedule the experiment, meaning to much and you will wait a long time before the experiment is started and to little and your job
will be killed before it is done.
# Node configuration #SBATCH --ntasks=5 #SBATCH --cpus-per-task=2 #SBATCH --mem-per-cpu=1024 #SBATCH --time=0:5:0
The last configuration below is a simple convenience, this tells SLURM to send us an email when the job begins and ends.
# Email configuration #SBATCH --mail-type=BEGIN,END #SBATCH --mail-user=jorgehn@student.matnat.uio.no
For all available options see here.
Special notes for Singularity
You should always copy your image to $SCRATCH
. The $SCRATCH
file system is faster storage much closer to the nodes running the container which means it will
load faster. If many tasks runs the same container (as the example above) this could lead to serious savings.
Also note that we ran Singularity with singularity run -n
, the -n
flag tells Singularity that the container should not have access to outside network resources.
This means that two ROS containers started on the same node will not see each other. This is nice since they will not get in each others way and we can run several containers on the same node.