Abel ROS Tutorial

From Robin

(Difference between revisions)
Jump to: navigation, search
Jorgehn (Talk | contribs)
(Ny side: [http://www.uio.no/english/services/it/research/hpc/abel/ Abel] is the super computer cluster at UiO. It gives user access to large amounts of hardware for experiments and supports many dif…)

Current revision as of 14:18, 13 December 2017

Abel is the super computer cluster at UiO. It gives user access to large amounts of hardware for experiments and supports many different software libraries out of the box. However, ROS is mainly supported on Ubuntu and as such is not supported on Abel. This document will describe the steps necessary to run ROS on Abel.

Contents

Containers

The underlying technology that enables us to run ROS on Abel is Linux Containers. Abel uses Singularity which is a container system optimized for use on super computers. The basics behind containers is that they contain our running process while exposing a limited file system. This enables the container to mimic any other Linux distribution allowing us to run Ubuntu on Abel.

Singularity

Singularity is the container system on Abel. As of this writing these versions are supported (read more about Abel modules)

$ module avail singularity

--------------------------------------------------------- /cluster/etc/modulefiles ---------------------------------------------------------
singularity/2.3.1          singularity/2.3.2(default) singularity/2.4

We are mainly interested in 2.4 which has support for network isolation which we need for ROS.

Creating a container

The basis for most Singularity containers is the Singularity receipe[1]. This file describes how the container should be built and from what basis. This document will not detail all the different possibilities, we will instead illustrate the most used options. Below is a condensed version of the Singularity script used to create custom containers.

All recipe files must have the following header

Bootstrap: localimage
From: /path/to/other/image

The header describes the basis of the image. For most of our purposes we will build from other local images, but we could also use Docker or one of the other methods.

The next section usually needed is the %setup section. When building ROS images this is where you would create a workspace and copy your ROS code into this workspace.

%setup
    # Since this is ran outside the container we need to know where the container is built, the $SINGULARITY_ROOTFS is the root of the file system
    mkdir -p $SINGULARITY_ROOTFS/workspace/src
    cp -r /path/to/local/code $SINGULARITY_ROOTFS/workspace/src/

The %setup section is run outside of the Singularity container and has full access to the file system on the machine it is running on.

The next section needed is %post. This section is executed inside the container and this is the time to build our workspace.

%post
   # Source the ros envirnoment to get access to catkin (NOTE the .sh extension)
   . /opt/ros/kinetic/setup.sh
   # Change directory to our workspace (NOTE since this is inside the container we don't need any special variables)
   cd /workspace/src
   # Create a new workspace
   catkin_init_wokspace
   # Move back up to workspace
   cd /workspace
   # Build workspace
   catkin_make -j1
   # (Might be needed)
   catkin_make install

The final section that is needed before running on Abel is %runscript. This section describes what should be done when a user executes singularity run on the image. Usually this is where we would run our nodes to execute the experiment.

%runscript
   # Source workspace
   . /workspace/devel/setup.sh
   # Run nodes
   roslaunch best_experiment_ever best_launch_file.launch

Necessary configuration for Abel

To run our container on Abel we need to ensure that the bind points that Abel expects us to support is there. To support this add the following to in your %post section.

%post
    mkdir -p /cluster /projects /work /usit

These folders will be used to bind the same folders into our container. This allows us to access home areas (e.g. /usit/abel/u1/jorgehn) and work area (i.e. $SCRATCH) among other things.

In addition, to support full network isolation we need to add the following to our %environment

%environment
    # Tell ROS to use `localhost` instead of actual machine address (which is not accessible with network isolation)
    export ROS_HOSTNAME=localhost
    export ROS_MASTER_URI=http://localhost:11311

This ensures that ROS uses localhost as address instead of the actual machine address which will not be accessible when using network isolation. On Abel we will use network isolation to be able to run several tasks on the same node without the tasks communicating with each other.

The quick way

We have created a container with ROS Kinetic and Gazebo 7.8.1 that should be enough for most ROS experiments. You can find this in the shared robin area /robin/jorgehn/Patched Gazebo. This container can be used as a basis for a Boostrap: localimage and it exposes everything necessary to run on able and it has some conveniences (like sourcing setup.sh automatically).

To use this simply download the image and have this as your header

Boostrap: localimage
From: /path/to/ros_gazebo.simg

And from there create a %setup section to copy in your custom node, a %post to compile and a final %runscript to run your experiments. The workspace is located in /workspace/src and inside the container is exposed as $ROS_WORKSPACE.

SLURM

SLURM is the scheduling software on Abel that is responsible for running or experiments on different machines (called nodes in SLURM). We will, again, not document everything that SLURM is capable of, but instead focus on what is needed to run Singularity images. It is recommended to read about the different command line tools that exist for SLURM for a better understanding of how to use the system [2].

Configuring SLURM

To illustrate how to use Singularity and SLURM we have reproduced to scripts below that are can be used as inspiration. The first is our submit script. This script is responsible for configuring SLURM for our experiment, requesting resources and loading modules.

#!/bin/bash

# This script is meant to be submitted with `sbatch`

### Configuration for queue system ###
# Account configuration
#SBATCH --account=uio
#SBATCH --job-name=map_gait_expr_v2
# Node configuration
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1024
#SBATCH --time=0:5:0
# Email configuration
#SBATCH --mail-type=BEGIN,END
#SBATCH --mail-user=jorgehn@student.matnat.uio.no

### Cluster setup ###
# Setup environment on cluster
source /cluster/bin/jobsetup
# Clear inherited modules
module purge
# Load specific version of Singularity
module load singularity/2.4
# If anything fails we exit immediately
set -o errexit 

### Experiment configuration ###
# Create result folder with date:
START_DATE=`date +%Y-%m-%d+%H-%M`
EXPERIMENT_FOLDER="$SCRATCH/$START_DATE"

### Initialize experiment ###
# Create result folder
mkdir -p $EXPERIMENT_FOLDER
# Mark result folder for extraction
cleanup "cp -r $EXPERIMENT_FOLDER $JOB_FOLDER"
echo "Storing experiment results in '$EXPERIMENT_FOLDER'"

### Run experiment script ###
srun --output=$EXPERIMENT_FOLDER/job-%2t.out singularity run -n singularity_image.simg

The most important part from the above script is the SLURM configuration. The account information is required to tell SLURM who to bill for the used resources. For most people this will be uio, but if you are part of a project with extra resources on Abel this should be changed. The name is not required, but can be handy to easier be able to identify the job.

# Account configuration
#SBATCH --account=uio
#SBATCH --job-name=map_gait_expr_v2

The node configuration below tells SLURM that we are requesting 5 task (meaning srun will be run 5x times). We also request 2 CPUs per task. The memory request is required and must be estimated (ROS + Gazebo requires around 1~2GB of memory). The last part is the most important and that is the time you expect the experiment to run for (in HH:MM:SS format). SLURM uses the time to estimate how and when it should schedule the experiment, meaning to much and you will wait a long time before the experiment is started and to little and your job will be killed before it is done.

# Node configuration
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1024
#SBATCH --time=0:5:0

The last configuration below is a simple convenience, this tells SLURM to send us an email when the job begins and ends.

# Email configuration
#SBATCH --mail-type=BEGIN,END
#SBATCH --mail-user=jorgehn@student.matnat.uio.no

For all available options see here.

Special notes for Singularity

You should always copy your image to $SCRATCH. The $SCRATCH file system is faster storage much closer to the nodes running the container which means it will load faster. If many tasks runs the same container (as the example above) this could lead to serious savings.

Also note that we ran Singularity with singularity run -n, the -n flag tells Singularity that the container should not have access to outside network resources. This means that two ROS containers started on the same node will not see each other. This is nice since they will not get in each others way and we can run several containers on the same node.

Personal tools
Front page