Deep Learning Workstations

From Robin

(Difference between revisions)
Jump to: navigation, search
m (Låste Deep Learning Workstations ([edit=sysop] (ubestemt) [move=sysop] (ubestemt)))
(Deep Learning Workstations)
(28 intermediate revisions not shown)
Line 1: Line 1:
== Deep Learning Workstations ==
== Deep Learning Workstations ==
-
We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.
+
We have shared workstations for projects that requires GPU and CPU power while retaining physical access to a computer. However, the local Deep Learning Workstations might not be the best solution for your project. Please see the [https://robin.wiki.ifi.uio.no/High_performance_computing High Performance Computing] article for more information.
Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:
Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:
-
* Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). vegardds/masters_students
+
We devide the available machines into two categories, namly <code>supported</code> and <code>stand-alone</code>. <code>supported</code> means that the client is running UiO supported operating system, typically the latest LTS-version of Red Hat Enterprise Linux. The other category; <code>stand-alone</code>, is set up with 3rd party OS and custom packages, meaning that the researcher themselves maintains the client.
 +
 
 +
For master's projects we prefer the students use the <code>supported</code> category.
 +
 
 +
* Alienware Aurora R7 - Intel i7 8700K (6-core), Nvidia RTX3090 (UiO: 113616). [https://mattermost.uio.no/ifi-robin/messages/@vegardds @vegardds]/shared resource
** hostname: ''dancer''
** hostname: ''dancer''
-
** URL:''dancer-robin.duckdns.org''
+
** URL:''dancer.ifi.uio.no''
 +
** Category: <code>supported</code>
** WLAN: D8:9E:F3:7A:84:B7
** WLAN: D8:9E:F3:7A:84:B7
** ETH: 9C:30:5B:13:AF:33
** ETH: 9C:30:5B:13:AF:33
-
* Alienware  Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia RTX2080ti (UiO: 113615). Kai
+
* Alienware  Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia RTX2080ti (UiO: 113615). [https://mattermost.uio.no/ifi-robin/messages/@kaiolae @kaiolae], [https://mattermost.uio.no/ifi-robin/messages/@benediwa @benediwa]
** hostname: ''dasher''
** hostname: ''dasher''
** URL: ''dasher-robin.duckdns.org''
** URL: ''dasher-robin.duckdns.org''
 +
** Category: <code>stand-alone</code>
** WLAN: D8:9E:F3:7A:7E:D1
** WLAN: D8:9E:F3:7A:7E:D1
** ETH: 9C:30:5B:13:C5:69
** ETH: 9C:30:5B:13:C5:69
-
* Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). vegardds/masters_students
+
* Alienware Aurora R7 - Intel i7 8700K (6-core), Nvidia RTX3090 (UiO: 113617). [https://mattermost.uio.no/ifi-robin/messages/@vegardds @vegardds]/shared resource
** hostname: ''dunder''
** hostname: ''dunder''
-
** URL: ''dunder-robin.duckdns.org''
+
** URL: ''dunder.ifi.uio.no''
 +
** Category: <code>supported</code>.
** WLAN: D8:9E:F3:7A:46:08
** WLAN: D8:9E:F3:7A:46:08
** ETH: 9E:30:5B:13:C5:8B
** ETH: 9E:30:5B:13:C5:8B
-
* Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti, 2x Nvidia GTX1080ti (UiO: 113614). Jørgen etc.
+
* Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 3x Nvidia GTX1080ti (UiO: 113614). [https://mattermost.uio.no/ifi-robin/messages/@vegardds @vegardds]/shared resource
-
** hostname: ''rudolph''  
+
** hostname: ''rudolph''
-
** URL: ''rdlf.duckdns.org''
+
** URL: ''rudolph.ifi.uio.no''
 +
** Category: <code>supported</code>
** WLAN: 9C:30:5B:13:C5:71
** WLAN: 9C:30:5B:13:C5:71
** ETH1: 30:9C:23:2A:EB:39
** ETH1: 30:9C:23:2A:EB:39
Line 30: Line 38:
* Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
* Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
-
** hostname: deepthinker
+
** hostname: ''deepthinker''
-
** url: deepthinker-robin.duckdns.org
+
** URL: ''deepthinker-robin.duckdns.org''
 +
** Category: <code>stand-alone</code>
* Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
* Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
** Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.
** Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.
 +
** Category: <code>stand-alone</code>
== Getting access to the workstations ==
== Getting access to the workstations ==
To get access to the workstations you either need to be a master student at ROBIN or have an advisor at ROBIN which can vouch for you.
To get access to the workstations you either need to be a master student at ROBIN or have an advisor at ROBIN which can vouch for you.
-
=== Getting a user account ===
+
To access the <code>supported</code> clients, all you need to do is to ssh into the machine with your UiO-credentials. See more [https://robin.wiki.ifi.uio.no/Deep_Learning_Workstations#Configure_SSH Configure SSH].
-
To get access to one of the local machines at Robin, send an email to robin-engineer@ifi.uio.no including the following information:
+
-
 
+
-
myusername=<username>
+
-
supervisor=<username>
+
-
masters_delivery_deadline=DD-MM-YYYY
+
-
software_requirements=e.g. Python3, TensorFlow, Caffe
+
-
ssh_access=Y/n (if Y, add a public-key as an appendix in the e-mail)
+
-
 
+
-
When you get access to the computer, you will additionally get added to a [https://mattermost.uio.no/login Mattermost]-channel with the same name as your hostcomputer. There is no job scheduler on these machines so it's required that you communicate with each other when you are conducting computational activities. Make sure you are familiar with the nvidia-smi command to be able to check current status.
+
-
 
+
-
{{note|  These machines does not have any backup. Make sure to sync (e.g to your home directory at UiO) your files when you've finished an experiment }}
+
-
 
+
-
To ssh to the host, you need to be on the eduroam or on VPN. Be aware that some maintenance on these machines are required and will lead to occasional downtime.
+
-
 
+
-
=== Setting up remote access ===
+
-
To gain remote access you need to generate an SSH key which will need to be installed on the machine. For security we do not, currently, allow remote access with passwords. The following steps will guide you through how to generate an SSH key on Linux, for Windows [https://www.techrepublic.com/blog/10-things/how-to-generate-ssh-keys-in-openssh-for-windows-10/ see this article].
+
-
 
+
-
First we will generate an SSH key using the program '''ssh-keygen'''.
+
-
 
+
-
<code>
+
-
$ ssh-keygen -o
+
-
</code>
+
-
Generating public/private rsa key pair.
+
-
Enter file in which to save the key (/home/username/.ssh/id_rsa): /home/username/.ssh/name-of-remote-machine
+
-
Enter passphrase (empty for no passphrase):
+
-
Enter same passphrase again:
+
-
 
+
-
Follow the steps above to generate a public and private SSH key. We recommend that you select a secure password, but it is not strictly needed. The private key should never be shared with anybody and the public key is what we need to copy to the physical machine.
+
-
 
+
-
Copy the file <code>/home/username/.ssh/name-of-remote-machine.pub</code> to a USB and ask mailto:robin-engineer@ifi.uio.no (Vegard) for physical access to the machine. Once there log in to your user and copy the content of the public key into the file <code>~/.ssh/authorized_keys</code>. You should now have remote access to the machine. Before leaving it is prudent to test that everything worked by using the machine you created the SSH keys on to log in remotely, <code>ssh username@remote-url</code>.
+
-
 
+
-
{{note| After setting up remote access you will still need to be connected to UiO either through Eduroam or VPN to access the machines}}
+
-
==== Configure SSH ====
+
=== Configure SSH ===
To make SSH slightly more pleasant to work with we can create a configuration file at <code>~/.ssh/config</code> which can contain additional information about remote machines. The following is a possible configuration for the Rudolph workstation.
To make SSH slightly more pleasant to work with we can create a configuration file at <code>~/.ssh/config</code> which can contain additional information about remote machines. The following is a possible configuration for the Rudolph workstation.
Line 80: Line 58:
     IdentityFile ~/.ssh/rudolph
     IdentityFile ~/.ssh/rudolph
-
This allows us to '''ssh''' using the command <code>ssh rudolph</code> without any other configuration.
+
This allows us to <code>ssh</code> using the command <code>ssh rudolph</code> without any other configuration.
-
===== SSH from outside UiOs networks =====
+
=== SSH from outside UiOs networks ===
To access the machines from outside UiOs networks, you need set up the following in the  <code>~/.ssh/config</code> file (in this example, we use dancer as an example):
To access the machines from outside UiOs networks, you need set up the following in the  <code>~/.ssh/config</code> file (in this example, we use dancer as an example):
-
  Host ifi-login        
+
  Host ifi-login
-
       Hostname login.ifi.uio.no
+
       Hostname login.ifi.uio.no
-
       User <my-username>
+
       User <my-username>
-
+
 
-
  Host dancer  
+
  Host dancer
       User <my-username>
       User <my-username>
       IdentityFile ~/.ssh/<key_to_dancer>
       IdentityFile ~/.ssh/<key_to_dancer>
-
       ProxyCommand ssh -q ifi-login nc  dancer-robin.duckdns.org 22
+
       ProxyCommand ssh -q ifi-login nc  dancer.ifi.uio.no 22
On Windows machines you need to add an extra argument to the <code> ProxyCommand </code> to make it work:
On Windows machines you need to add an extra argument to the <code> ProxyCommand </code> to make it work:
-
  Host dancer  
+
  Host dancer
       User <my-username>
       User <my-username>
       IdentityFile ~/.ssh/<key_to_dancer>
       IdentityFile ~/.ssh/<key_to_dancer>
-
       ProxyCommand C:\Windows\System32\OpenSSH\ssh.exe -q login nc dancer-robin.duckdns.org 22
+
       ProxyCommand C:\Windows\System32\OpenSSH\ssh.exe -q ifi-login nc dancer.ifi.uio.no 22
Now you should be able to run <code> ssh dancer </code>.
Now you should be able to run <code> ssh dancer </code>.
-
===== Mosh =====
+
=== SSH key ===
 +
We recommend using key-pairs for secure login. See more information here: https://www.ssh.com/ssh/keygen/.
 +
 
 +
'''NOTE:''' In case you choose to use key-pairs make sure to never share your private key.
 +
 
 +
=== Mosh ===
We also recommend students to install [[https://mosh.org/ '''mosh''']] for a better '''ssh''' experience. The usage is exactly the same as with '''ssh''' except that '''mosh''' is capable of staying connected even when roaming and through hibernation of laptops. Mosh should be installed on all workstations.
We also recommend students to install [[https://mosh.org/ '''mosh''']] for a better '''ssh''' experience. The usage is exactly the same as with '''ssh''' except that '''mosh''' is capable of staying connected even when roaming and through hibernation of laptops. Mosh should be installed on all workstations.
Line 110: Line 93:
'''NOTE:''' mosh does not support X-Forwarding.
'''NOTE:''' mosh does not support X-Forwarding.
-
== Setting up the workstations ==  
+
== Using the <code>supported</code> workstations ==
-
As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.
+
=== Anaconda ===
 +
The most frequently used machine learning tools is available through Anaconda. If you are not familiar with Anaconda, take a look at [https://docs.anaconda.com/anaconda/navigator/tutorials/ these] tutorials. Another tip is to download the [https://docs.anaconda.com/_downloads/9ee215ff15fde24bf01791d719084950/Anaconda-Starter-Guide.pdf Anaconda Cheat sheet].
-
* For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
+
To see how you can install machine learning packages, see [https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/ Working with GPU packages].
-
* The shared (spinning) disk can stay as NTFS.
+
-
* Ubuntu 16.04 LTS current best practice
+
-
* Windows
+
'''NOTE:''' the environments will be installed in your home directory, e.g: <code> ~/.conda/envs/<env_name> </code>. Make sure that you have enaugh storage.
-
** [https://www.visualstudio.com Visual Studio Community 2017]
+
-
** [https://developer.nvidia.com/cuda-downloads CUDA]
+
-
** [https://developer.nvidia.com/cudnn CUDnn]
+
-
** [https://www.anaconda.com/download/ Anaconda]
+
-
** [https://www.tensorflow.org/install/install_windows Tensorflow with GPU support]
+
-
* TODO: Investigate Ubuntu 18.04
+
=== Initialization ===
-
* CUDA on Linux: Choose the deb (network) option - this saves time from doing patches and updates to the latest NVIDIA driver automagically.
+
-
* CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
+
-
* CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file [https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77 following instructions from here].
+
-
** Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, [https://medium.com/@zhanwenchen/install-cuda-and-cudnn-for-tensorflow-gpu-on-ubuntu-79306e4ac04e following these instructions]. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.
+
-
== Tensorflow ==
+
To use Anoconda on the DL workstation, you are required to do some first time config. Log in to a IFI client e.g: <code> login.ifi.uio.no </code>. Perform the following:
-
* Install tensorflow-gpu to use GPUs.
+
$ export PATH="/opt/ifi/anaconda3/bin:$PATH"
-
* Tensorflow requires specific versions of CUDA and CUDnn
+
$ conda init
-
** For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
+
-
** For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).
+
-
** Tensorflow 1.8 requires CUDA 9.2, which in turn requires at least version 396 of the NVIDIA driver.
+
-
== Ubuntu 16.04 Setup ==
+
These commands adds some stuff to your <code> ~/.bashrc </code>. You should now be able to see that you are working in the <code> base </code> environment.
-
Install Ubuntu 16.04, do not turn Secure Boot off.
+
=== Example usage ===
-
* Shrink Windows NTFS volume in Disk Management
+
There is a huge community using Anaconda and deep learning tools, hence we encourage you to do some investigations on how to use it for your own. However, you can take a look at the simplistic example below to get you started.
-
* Remove graphics cards
+
-
* Install Ubuntu following [https://medium.com/@FloodSung/tutorial-how-to-install-ubuntu-16-04-windows10-on-alienware-15-r3-91cd1dc7eb3c these instructions]
+
-
* Install Nvidia graphics driver from [https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa PPA repository].
+
-
* Put graphics cards back in.
+
-
* verify that you can boot into Windows and Ubuntu
+
-
* Ubuntu sometimes freezes on shutdown due to an I2C driver, which can [https://github.com/rdjondo/TensorFlowGPUonUbuntu/wiki/Installing-Ubuntu-16.04-LTS-in-Dual-Boot-with-NVIDIA-GPU-support be blacklisted]
+
-
* If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use [https://gist.github.com/Garoe/74a0040f50ae7987885a0bebe5eda1aa this script to sign kernel modules.] - if this path is taken, modules need to be signed against any new future kernels.
+
-
* Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4
+
-
==== Setup SSH Server ====
+
Let's say we want to use TensorFlow in our project. First we need to create a environment:
-
* The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
+
$ conda create --name ml_project tensorflow-gpu
-
* Broadly, follow [https://askubuntu.com/questions/2271/how-to-harden-an-ssh-server these instructions] to disable password access. Maybe a good idea to use fail2ban or deny hosts, but just disabling password access should be a good start.
+
-
* Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
+
-
* ssh-copy-id won't work if password access is turned off :-\
+
-
==== Setup Dynamic DNS ====
+
When the command is executed, you will get a list of all the packages required to create this environment and the size of the installation. Once you accept, the installation begins. This might take a while depending on the size of the packages.
-
* Good to have a fixed URL for each machine as the IP address from Eduroam can change.
+
On completion you should be able to run this command to see your new environment:
-
* Dynamic DNS services allow the computer to update it's own IP address on the DNS server.
+
-
* (Even with public URL, still need to be on UiO network or VPN to login via SSH)
+
-
* Get a URL from no-ip.com or DuckDNS
+
-
* Install ddclient on Ubuntu to configure dynDNS -- need to install back ported v3.8.3 to support duckdns [https://launchpad.net/~rhansen/+archive/ubuntu/ddclient https://launchpad.net/~rhansen/+archive/ubuntu/ddclient]
+
-
* Configure /etc/ddclient.conf as follows: [https://sourceforge.net/p/ddclient/wiki/protocols/#duckdns https://sourceforge.net/p/ddclient/wiki/protocols/#duckdns]
+
-
  <nowiki>
+
  $ conda env list
-
protocol=duckdns
+
-
password=token-from-duck-dns
+
-
chosen-host-url.duckdns.org
+
-
run_daemon="true"
+
-
</nowiki>
+
-
* start service: sudo service ddclient start
+
This will output the environments you can activate. Output:
-
* show status: sudo service ddclient status
+
-
==== Remote Connections ====
+
# conda environments:
 +
#
 +
base                  *  /opt/ifi/anaconda3
 +
ml_project              /uio/kant/ifi-ansatt-u03/vegardds/.conda/envs/ml_project
-
* Ports on Eduroam-connected systems are generally not opened externally
+
The star(*) indicates which environment you're currently using.
-
* You can access them by connecting to the University VPN and then connecting.
+
-
* [https://www.uio.no/tjenester/it/nett/utenfra/vpn/ Instructions here for setting up VPN on your client system.]
+
-
==== Port forwarding for Jupyter Notebook ====
+
To change to the new environment run:
-
Here’s a trick to use jupyter notebook in your local laptop browser, but to do the work on our GPU workstation, Dasher.
+
$ conda activate ml_project
-
Login to Dasher via SSH with the port forwarding switches:
+
The environment should now be ready to go. Moreover, if you require another package use:
-
  <nowiki>
+
  $ conda install <package_name>
-
ssh -L 8888:localhost:8888 charles@dasher-robin.duckdns.org
+
-
</nowiki>
+
-
This maps the address “localhost:8888” on dasher to port “localhost:8888” on your laptop.
+
You can find the most widely used [https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/ ML packages here].
-
Then you can run Jupyter on the ssh connection:
+
== Running DL-activities on the <code>supported</code> workstations ==
-
Jupyter notebook
+
Please keep in mind that these workstations is a shared resource. They do not have any queue solutions, so we encourage you to share the resources as best as you can. We have some tools to help us see which processes are running:
-
When Jupyter runs it shows you a special URL for the notebook session, including a unique token. Copy this, then on your local computer, open a browser and go to:
+
=== Some commands you need to get familiar with ===
-
<nowiki>
+
* <code>nvidia-smi</code>: See current GPU usage
-
http://localhost:8888/?token=TOKEN_TEXT_HERE
+
* <code>htop</code>: See current CPU/RAM usage
-
</nowiki>
+
* <code>w</code>: See currently logged in users
 +
* [https://mattermost.uio.no/login Mattermost]: ROBIN-workspace with channels named after the hostnames of the DL-workstations.
-
Now you’re ready to compute something difficult!
+
Please use the [https://mattermost.uio.no/login Mattermost] ROBIN-workspace to coordinate resource usage.
-
More info: [https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding Ubuntu Help on Port Forwarding]
+
=== Before running DL-activities ===
-
==== Using Tensorflow and Keras ====
+
# Make sure you are on one of the supported workstations e.g: <code>dunder</code>.
 +
# Check if someone is running DL-activities using the commands above.
 +
# While running experiments, be available for DMs on Mattermost and messages in the <code><hostname></code>-channel.
-
Master student accounts do not have root (admin) access, so you should install python packages you require locally. Using Python 3 is recommended!
+
== Using the <code>stand-alone</code> workstations ==
 +
To get access to one of the local machines at Robin, send an email to robin-engineer@ifi.uio.no including the following information:
-
The recommended way to do this is with pip and a local virtualenv for your user account, there's good instructions on the Tensorflow install guide: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv
+
myusername=<username>
 +
supervisor=<username>
 +
masters_delivery_deadline=DD-MM-YYYY
 +
software_requirements=e.g. Python3, TensorFlow, Caffe
 +
ssh_access=Y/n (if Y, add a public-key as an appendix in the e-mail)
-
You can do this as follows (it's good Python/Unix practice to know and use virtual environments)
+
When you get access to the computer, you will additionally be added to a [https://mattermost.uio.no/login Mattermost]-channel with the same name as your host computer. There is no job scheduler on these machines so it's required that you communicate with each other when you are conducting computational activities. Make sure you are familiar with the nvidia-smi command to be able to check current status.
 +
 
 +
{{note|  These machines does not have any backup. Make sure to sync (e.g to your home directory at UiO) your files when you've finished an experiment }}
 +
 
 +
To ssh to the host, you need to be on the eduroam or on VPN. Be aware that some maintenance on these machines are required and will lead to occasional downtime.
-
<nowiki>
+
== Current usage of the <code>supported</code> clients ==
-
mkdir ~/tensorflow
+
-
cd ~/tensorflow
+
-
virtualenv --system-site-packages -p python3 venv
+
-
source ~/tensorflow/venv/bin/activate # activates your virtual environment
+
-
pip install -U tensorflow-gpu keras matplotlib tensorflow-probability-gpu # add packages you want
+
-
</nowiki>
+
-
The line that starts with "source" starts up your virtualenv, so you'll need to run that whenever you want to do some work in tensorflow and python. This setup is recommended because you then get to have your own version of tensorflow and nothing anybody else does can mess up your work environment. Practical!
+
<div><iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://api.robin.uiocloud.no/ws/cpuhistory" height="525" width="80%"></iframe></div>
-
You can test that it's work by opening Python and running some tensorflow code:
+
<div><iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://api.robin.uiocloud.no/ws/gpuhistory" height="525" width="80%"></iframe></div>
-
<nowiki>
+
== Got questions? ==
-
import tensorflow as tf
+
-
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+
-
print(sess.run(tf.multiply(tf.constant([2.]), tf.constant([3.]))))
+
-
</nowiki>
+
-
This should print out some stuff about using the GPUs and, for reference, the answer should be 6!
+
If you have any questions feel free to use [https://mattermost.uio.no/login Mattermost]. If you have feature requests or technical questions, send it to robin-engineer@ifi.uio.no.

Revision as of 15:40, 5 March 2021

Contents

Deep Learning Workstations

We have shared workstations for projects that requires GPU and CPU power while retaining physical access to a computer. However, the local Deep Learning Workstations might not be the best solution for your project. Please see the High Performance Computing article for more information.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

We devide the available machines into two categories, namly supported and stand-alone. supported means that the client is running UiO supported operating system, typically the latest LTS-version of Red Hat Enterprise Linux. The other category; stand-alone, is set up with 3rd party OS and custom packages, meaning that the researcher themselves maintains the client.

For master's projects we prefer the students use the supported category.

  • Alienware Aurora R7 - Intel i7 8700K (6-core), Nvidia RTX3090 (UiO: 113616). @vegardds/shared resource
    • hostname: dancer
    • URL:dancer.ifi.uio.no
    • Category: supported
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia RTX2080ti (UiO: 113615). @kaiolae, @benediwa
    • hostname: dasher
    • URL: dasher-robin.duckdns.org
    • Category: stand-alone
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Alienware Aurora R7 - Intel i7 8700K (6-core), Nvidia RTX3090 (UiO: 113617). @vegardds/shared resource
    • hostname: dunder
    • URL: dunder.ifi.uio.no
    • Category: supported.
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 3x Nvidia GTX1080ti (UiO: 113614). @vegardds/shared resource
    • hostname: rudolph
    • URL: rudolph.ifi.uio.no
    • Category: supported
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • URL: deepthinker-robin.duckdns.org
    • Category: stand-alone
  • Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
    • Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.
    • Category: stand-alone

Getting access to the workstations

To get access to the workstations you either need to be a master student at ROBIN or have an advisor at ROBIN which can vouch for you.

To access the supported clients, all you need to do is to ssh into the machine with your UiO-credentials. See more Configure SSH.

Configure SSH

To make SSH slightly more pleasant to work with we can create a configuration file at ~/.ssh/config which can contain additional information about remote machines. The following is a possible configuration for the Rudolph workstation.

Host rudolph
    User <my-username>
    HostName rdlf.duckdns.org
    IdentityFile ~/.ssh/rudolph

This allows us to ssh using the command ssh rudolph without any other configuration.

SSH from outside UiOs networks

To access the machines from outside UiOs networks, you need set up the following in the ~/.ssh/config file (in this example, we use dancer as an example):

Host ifi-login
     Hostname login.ifi.uio.no
     User <my-username>
Host dancer
     User <my-username>
     IdentityFile ~/.ssh/<key_to_dancer>
     ProxyCommand ssh -q ifi-login nc  dancer.ifi.uio.no 22

On Windows machines you need to add an extra argument to the ProxyCommand to make it work:

Host dancer
     User <my-username>
     IdentityFile ~/.ssh/<key_to_dancer>
     ProxyCommand C:\Windows\System32\OpenSSH\ssh.exe -q ifi-login nc dancer.ifi.uio.no 22

Now you should be able to run ssh dancer .

SSH key

We recommend using key-pairs for secure login. See more information here: https://www.ssh.com/ssh/keygen/.

NOTE: In case you choose to use key-pairs make sure to never share your private key.

Mosh

We also recommend students to install [mosh] for a better ssh experience. The usage is exactly the same as with ssh except that mosh is capable of staying connected even when roaming and through hibernation of laptops. Mosh should be installed on all workstations.

NOTE: mosh does not support X-Forwarding.

Using the supported workstations

Anaconda

The most frequently used machine learning tools is available through Anaconda. If you are not familiar with Anaconda, take a look at these tutorials. Another tip is to download the Anaconda Cheat sheet.

To see how you can install machine learning packages, see Working with GPU packages.

NOTE: the environments will be installed in your home directory, e.g: ~/.conda/envs/<env_name> . Make sure that you have enaugh storage.

Initialization

To use Anoconda on the DL workstation, you are required to do some first time config. Log in to a IFI client e.g: login.ifi.uio.no . Perform the following:

$ export PATH="/opt/ifi/anaconda3/bin:$PATH"
$ conda init

These commands adds some stuff to your ~/.bashrc . You should now be able to see that you are working in the base environment.

Example usage

There is a huge community using Anaconda and deep learning tools, hence we encourage you to do some investigations on how to use it for your own. However, you can take a look at the simplistic example below to get you started.

Let's say we want to use TensorFlow in our project. First we need to create a environment:

$ conda create --name ml_project tensorflow-gpu

When the command is executed, you will get a list of all the packages required to create this environment and the size of the installation. Once you accept, the installation begins. This might take a while depending on the size of the packages.

On completion you should be able to run this command to see your new environment:

$ conda env list

This will output the environments you can activate. Output:

# conda environments:
#
base                  *  /opt/ifi/anaconda3
ml_project               /uio/kant/ifi-ansatt-u03/vegardds/.conda/envs/ml_project

The star(*) indicates which environment you're currently using.

To change to the new environment run:

$ conda activate ml_project

The environment should now be ready to go. Moreover, if you require another package use:

$ conda install <package_name>

You can find the most widely used ML packages here.

Running DL-activities on the supported workstations

Please keep in mind that these workstations is a shared resource. They do not have any queue solutions, so we encourage you to share the resources as best as you can. We have some tools to help us see which processes are running:

Some commands you need to get familiar with

  • nvidia-smi: See current GPU usage
  • htop: See current CPU/RAM usage
  • w: See currently logged in users
  • Mattermost: ROBIN-workspace with channels named after the hostnames of the DL-workstations.

Please use the Mattermost ROBIN-workspace to coordinate resource usage.

Before running DL-activities

  1. Make sure you are on one of the supported workstations e.g: dunder.
  2. Check if someone is running DL-activities using the commands above.
  3. While running experiments, be available for DMs on Mattermost and messages in the <hostname>-channel.

Using the stand-alone workstations

To get access to one of the local machines at Robin, send an email to robin-engineer@ifi.uio.no including the following information:

myusername=<username>
supervisor=<username>
masters_delivery_deadline=DD-MM-YYYY
software_requirements=e.g. Python3, TensorFlow, Caffe
ssh_access=Y/n (if Y, add a public-key as an appendix in the e-mail)

When you get access to the computer, you will additionally be added to a Mattermost-channel with the same name as your host computer. There is no job scheduler on these machines so it's required that you communicate with each other when you are conducting computational activities. Make sure you are familiar with the nvidia-smi command to be able to check current status.

These machines does not have any backup. Make sure to sync (e.g to your home directory at UiO) your files when you've finished an experiment

To ssh to the host, you need to be on the eduroam or on VPN. Be aware that some maintenance on these machines are required and will lead to occasional downtime.

Current usage of the supported clients

Got questions?

If you have any questions feel free to use Mattermost. If you have feature requests or technical questions, send it to robin-engineer@ifi.uio.no.

Personal tools
Front page