Robin-hpc

From Robin

(Difference between revisions)

Revision as of 13:23, 26 February 2021

Hardware and network configuration

The robin-hpc is a shared resource for robins researchers and Master students. The strength of the machine is the amount of CPU cores and RAM. Unforunatly, there's no GPU available in this service.

Specs

	CPU	RAM	OS
login node	2 cores/4 vCPU	16GB	CentOS
worrker node	120 cores/240 vCPU	460GB	CentOS

Storage

The storage on the nodes consists of one 1TB disk where 100GB is reserved for software and 900GB is reserved for the users of the node. However, each user has a soft limit of 20 GB and a hard limit of 100 GB with a grace period of 14 days.

It's important to note that there is no backup of the disk, so do not use the robin-hpc as a cloud storage service. We suggest using rsync/scp to your home area on login.ifi.uio.no.

E.g.

rsync --progress <file>.tar.gz <username>@login.ifi.uio.no:~/<path>/<to>/<wherever>

It is also a possibility to mount your UiO home directory to the robin-hpc.

Access

Apply for access using this link: https://nettskjema.no/a/robin-hpc

SLURM

Ropin-hpc uses slurm for job scheduling.

To start a job on robin-hpc you will need to use a job script specifying the resources you want to use. The job is started with the following command:

sbatch nameofthejobscript.sh

(Note: Please do not run your program on the login node, test your implementation before copying the files to robin-hpc.)

You can easily copy the files needed to run your program to your home area on robin-hpc with scp.

Below you will find a template for the job script, followed by some convenient commands.

SLURM job script template

#!/bin/bash                                                                                                                                                                                                 

# This is a template for a slurm job script.
# To start a job on robin-hpc please use the command "sbatch nameofthisscript.sh".                                                                                                                            

# Job name                                                                                                                                                                                                  
#SBATCH --job-name=nameofmyjob                                                                                                                                                                              

# Wall clock time limit (hh:mm:ss). The program will be killed when the time limit is reached.                                                                                                              
#SBATCH --time=01:00:00                                                                                                                                                                                     

# Number of tasks to start in parallel from this script.                                                                                                                                                    
# (i.e. myprogram.py below will be started ntasks times)                                                                                                                                                    
#SBATCH --ntasks=1                                                                                                                                                                                          

# CPUs allocated per task                                                                                                                                                                                   
#SBATCH --cpus-per-task=16                                                                                                                                                                                  

# Memory allocated per cpu                                                                                                                                                                                  
#SBATCH --mem-per-cpu=1G                                                                                                                                                                                    

# Set exit on errors                                                                                                                                                                                        
set -o errexit
set -o nounset

# Load your environment                                                                                                                                                                                     
source myenv/bin/activate

# Run your program with "srun yourcommand"                                                                                                                                                                  
# stdout and stderr will be written to a file "slurm-jobid.out".                                                                                                                                            
# (warning: all tasks will write to the same slurm.out file)                                                                                                                                                
srun python3 myprogram.py

Convenient commands

View the status of all jobs:

squeue

View the status of your own jobs:

squeue -u yourusername

Software

Matlab R2019b

Todo: sebastto

Setting up the SLURM job script

#SBATCH --job-name=matlab_job
#SBATCH --ntasks=1
#SBATCH --cpus-per-task 16

srun matlab -batch "addpath(genpath('/path/to/your/matlab/folder'));run('myScript.m')"

Running Matlab in batch mode is the most safe option for running MATLAB on a HPC. (From Mathworks documentation)[1]:

-batch statement

Starts without the desktop
Does not display the splash screen
Executes statement
Disables changes to preferences
Disables toolbox caching
Logs text to stdout and stderr
Does not display modal dialog boxes
Exits automatically with exit code 0 if script executes successfully. Otherwise, MATLAB terminates with a non-zero exit code.

The addpath(genpath('/path/to/your/matlab/folder')) part adds all files in the specified directory to the MATLAB search path. Afterwards we run the main script of your program with run('myScript.m').

Utilizing parallel computing in your MATLAB Script

When the SLURM worker node is setting up your job, a number of environment variables is set. We can use the environment variable SLURM_CPUS_ON_NODE to get the number of CPU cores available in our MATLAB script. In fact, we can use that variable to dynamically select the number of workers in the MATLAB parallel pool, so that your script works both on your own computer and on the HPC.

SLURM_CPUS_STR = getenv('SLURM_CPUS_ON_NODE');

% Delete parallel pool from earlier runs
delete(gcp('nocreate'));

if isempty(SLURM_CPUS_STR) 
    % Run on personal computer (with however many cores your CPU has)
    parpool(6);
else
    % Run on SLURM-scheduled HPC
    SLURM_CPUS_NUM = str2num(SLURM_CPUS_STR);
    parpool(SLURM_CPUS_NUM);
end

Anaconda

Todo: emmaste

Podman

alias docker=podman

Robin-hpc

From Robin

Revision as of 13:23, 26 February 2021

Contents

Hardware and network configuration

Specs

Storage

Access

SLURM

Software

Matlab R2019b

Anaconda

Podman

Views

Personal tools

Front page

Navigation

Search

Toolbox

@@ Line 40: / Line 40: @@
 == SLURM ==
-Todo: emmaste
+Ropin-hpc uses slurm for job scheduling.
+To start a job on robin-hpc you will need to use a job script specifying the resources you want to use.
+The job is started with the following command:
+<pre class="brush: bash">
+sbatch nameofthejobscript.sh
+</pre>
+(Note: Please do not run your program on the login node, test your implementation before copying the files to robin-hpc.)
+You can easily copy the files needed to run your program to your home area on robin-hpc with scp.
+Below you will find a template for the job script, followed by some convenient commands.
 '''SLURM job script template'''
@@ Line 46: / Line 58: @@
 #!/bin/bash
 # This is a template for a slurm job script.
 # To start a job on robin-hpc please use the command "sbatch nameofthisscript.sh".
-# To view the status of all jobs use "squeue"
-# To view the status of your own jobs use "squeue -u yourusername"
 # Job name
@@ Line 79: / Line 89: @@
 srun python3 myprogram.py
+</pre>
+'''Convenient commands'''
+View the status of all jobs:
+<pre class="brush: bash">
+squeue
+</pre>
+View the status of your own jobs:
+<pre class="brush: bash">
+squeue -u yourusername
 </pre>