Robin-hpc

From Robin

(Difference between revisions)

Current revision as of 09:31, 7 May 2024

Hardware and network configuration

The robin-hpc is a shared resource for robins researchers and Master students. The strength of the machine is the amount of CPU cores and RAM. Unforunatly, there's no GPU available in this service.

Specs

	CPU	RAM	OS
login node	2 cores/4 vCPU	16GB	RHEL9
worker node	120 cores/240 vCPU	460GB	RHEL9

Storage

The storage on the nodes consists of one 1TB disk where 100GB is reserved for software and 900GB is reserved for the users of the node. However, each user has a soft limit of 20 GB and a hard limit of 100 GB with a grace period of 14 days.

It's important to note that there is no backup of the disk, so do not use the robin-hpc as a cloud storage service. We suggest using rsync/scp to your home area on login.ifi.uio.no.

E.g.

rsync --progress <file>.tar.gz <username>@login.ifi.uio.no:~/<path>/<to>/<wherever>

It is also a possibility to mount your UiO home directory to the robin-hpc.

Network

For security reasons, the robin-hpc is only accessable from ifis-networks. If you use a desktop machine at ifi, you can access the robin-hpc directly. However, from your own laptop you are required to login to ifi's login cluster, namely login.ifi.uio.no, before you can access hpc.robin.uiocloud.no.

URL for the robin-hpc login node is: hpc.robin.uiocloud.no.

SSH from outside UiOs networks

To access the machine from outside UiOs networks, see Configure SSH. To make SSH slightly more pleasant to work with we can create a configuration file at ~/.ssh/config which can contain additional information about remote machines. The following is a possible configuration for the Robin-hpc (see here for more info). Indiviual key paths can be added with IdentityFile ~/.ssh/<key_on_jump_host>.

Host uio-login
 User <my-username>
 Hostname <my-uio-hostname>

Host hpc-robin
 HostName hpc.robin.uiocloud.no
 ProxyJump uio-login
 User <my-username>

Now, ssh hpc-robin should directly establish a connection to the server (commands like scp should also work in the same way).

Access

All you need to do is to ssh <your_username>@hpc.robin.uiocloud.no from a UiO network.

SLURM

Ropin-hpc uses slurm for job scheduling.

To start a job on robin-hpc you will need to use a job script specifying the resources you want to use. The job is started with the following command:

sbatch nameofthejobscript.sh

(Note: Please do not run your program on the login node, test your implementation before copying the files to robin-hpc. You can easily copy the files needed to run your program to your home area on robin-hpc with scp.)

Below you will find a template for the job script, followed by some convenient commands.

For more information about slurm please visit https://slurm.schedmd.com/quickstart.html.

SLURM job script template

#!/bin/bash                                                                                                                                                                                                 

# This is a template for a slurm job script.
# To start a job on robin-hpc please use the command "sbatch nameofthisscript.sh".                                                                                                                            

# Job name                                                                                                                                                                                                  
#SBATCH --job-name=nameofmyjob                                                                                                                                                                              

# Wall clock time limit (hh:mm:ss). 
# (Note: The program will be killed when the time limit is reached.)                                                                                                         
#SBATCH --time=01:00:00                                                                                                                                                                                     

# Number of tasks to start in parallel from this script.                                                                                                                                                    
# (i.e. myprogram.py below will be started ntasks times)                                                                                                                                                    
#SBATCH --ntasks=1                                                                                                                                                                                          

# CPUs allocated per task                                                                                                                                                                                   
#SBATCH --cpus-per-task=16                                                                                                                                                                                  

# Memory allocated per cpu                                                                                                                                                                                  
#SBATCH --mem-per-cpu=1G                                                                                                                                                                                    

# Set exit on errors                                                                                                                                                                                        
set -o errexit
set -o nounset

# Load your environment                                                                                                                                                                                     
source myenv/bin/activate

# Run your program with "srun yourcommand"                                                                                                                                                                  
# stdout and stderr will be written to a file "slurm-jobid.out".                                                                                                                                            
# (warning: all tasks will write to the same slurm.out file)                                                                                                                                                
srun python3 myprogram.py

Additional options

To run multiple tasks in sequence you can also use the --array option of the sbatch command. You can add the following to the script. You can then get the job array number with SLURM_ARRAY_TASK_ID. The following example would run 900 jobs:

#SBATCH --array=0-899

Convenient commands

View the status of all jobs:

squeue

View the status of your own jobs:

squeue -u yourusername

Cancel all your jobs:

scancel -u yourusername

Cancel all jobs with name "job_name":

scancel -n job_name

Software

Anaconda

After initializing Anaconda (see Deep Learning Workstations), it is possible to start a job on robin-hpc by using the command above and the following template:

SLURM Anaconda template

 #!/usr/bin/bash

 #SBATCH --job-name=<test_name>
 #SBATCH --output=log.txt
 #SBATCH --ntasks=1
 #SBATCH --time=01:00:00
 #SBATCH --mem-per-cpu=100

 # load anaconda and your environment
 source ~/.bashrc
 conda activate <your_environment>

 # Run your program with "srun your_command"     
 srun python3 <script_name>.py

The output is written to "log.txt" and can be traced via "tail -f log.txt".

Running containers

apptainer is installed on robin-hpc. Read more about apptainer/singularity on https://sylabs.io/guides/3.0/user-guide/index.html .

Acknowledge use of `robin-hpc`

The robin-hpc is run on NREC. Please acknoladge useage of their services using the following statment:

The computations were performed on the Norwegian Research and Education
Cloud (NREC), using resources provided by the University of
Bergen and the University of Oslo. http://www.nrec.no/

Ref https://docs.nrec.no/faq.html#how-to-acknowledge-the-use-of-nrec

@@ Line 15: / Line 15: @@
 | 2 cores/4 vCPU
 | 16GB<br />
-| CentOS
+| RHEL9
 |-
 | worker node
 | 120 cores/240 vCPU
 | 460GB
-| CentOS
+| RHEL9
 |}
@@ Line 54: / Line 54: @@
 Now, <code>ssh hpc-robin</code> should directly establish a connection to the server (commands like <code>scp</code> should also work in the same way).
 == Access ==
-Apply for access using this link: https://nettskjema.no/a/robin-hpc. More information on how to get access is then given via e-mail.
+All you need to do is to <code> ssh <your_username>@hpc.robin.uiocloud.no</code> from a UiO network.
 == SLURM ==
@@ Line 145: / Line 145: @@
 == Software ==
-=== Matlab ===
+<!-- === Matlab ===
-{{TODO|sebastto}}
 '''Setting up the SLURM job script'''
@@ Line 194: / Line 193: @@
 end
 </pre>
+--->
 === Anaconda ===
@@ Line 219: / Line 219: @@
 The output is written to <code>"log.txt"</code> and can be traced via <code>"tail -f log.txt"</code>.
-=== Podman ===
+=== Running containers ===
- alias docker=podman
+<code>apptainer</code> is installed on <code>robin-hpc</code>. Read more about apptainer/singularity on https://sylabs.io/guides/3.0/user-guide/index.html .
+== Acknowledge use of <code>robin-hpc</code> ==
+The <code>robin-hpc</code> is run on NREC. Please acknoladge useage of their services using the following statment:
+ The computations were performed on the Norwegian Research and Education
+ Cloud (NREC), using resources provided by the University of
+ Bergen and the University of Oslo. http://www.nrec.no/
+Ref https://docs.nrec.no/faq.html#how-to-acknowledge-the-use-of-nrec

Robin-hpc

From Robin

Current revision as of 09:31, 7 May 2024

Contents

Hardware and network configuration

Specs

Storage

Network

SSH from outside UiOs networks

Access

SLURM

Software

Anaconda

Running containers

Acknowledge use of `robin-hpc`

Views

Personal tools

Front page

Navigation

Search

Toolbox

Robin-hpc

From Robin

Current revision as of 09:31, 7 May 2024

Contents

Hardware and network configuration

Specs

Storage

Network

SSH from outside UiOs networks

Access

SLURM

Software

Anaconda

Running containers

Acknowledge use of robin-hpc

Views

Personal tools

Front page

Navigation

Search

Toolbox

Acknowledge use of `robin-hpc`