Deep Learning Workstations

Fra Robin

Revisjon per 20. mar 2020 kl. 16:35 av Vegardds (Diskusjon | bidrag)
Gå til: navigasjon, søk

Innhold

Deep Learning Workstations

We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). Farzan
    • hostname: dancer
    • URL:dancer-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia RTX2080ti (UiO: 113615). Kai
    • hostname: dasher
    • URL: dasher-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). Vegard/Masterstudenter
    • hostname: dunder
    • URL: dunder-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti, 2x Nvidia GTX1080ti (UiO: 113614). Jørgen etc.
    • hostname: rudolph
    • URL: rdlf.duckdns.org
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • url: deepthinker-robin.duckdns.org
  • Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
    • Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.

Getting access to the workstations

To get access to the workstations you either need to be a master student at ROBIN or have an advisor at ROBIN which can vouch for you.

Getting a user account

To get access to one of the local machines at Robin, send an email to robin-engineer@ifi.uio.no including the following information:

myusername=<username>
supervisor=<username>
masters_delivery_deadline=DD-MM-YYYY
software_requirements=e.g. Python3, TensorFlow, Caffe
ssh_access=Y/n (if Y, add a public-key as an appendix in the e-mail)

When you get access to the computer, you will additionally get added to a channel with the same name as your hostcomputer. There is no job scheduler on these machines so it's required that you communicate with each other when you are conducting computational activities. Make sure you are familiar with the nvidia-smi command to be able to check current status.

These machines does not have any backup. Make sure to sync (e.g to your home directory at UiO) your files when you've finished an experiment

To ssh to the host, you need to be on the eduroam or on VPN. Be aware that some maintenance on these machines are required and will lead to occasional downtime.

Setting up remote access

To gain remote access you need to generate an SSH key which will need to be installed on the machine. For security we do not, currently, allow remote access with passwords. The following steps will guide you through how to generate an SSH key on Linux, for Windows see this article.

First we will generate an SSH key using the program ssh-keygen.

$ ssh-keygen -o

Generating public/private rsa key pair.
Enter file in which to save the key (/home/username/.ssh/id_rsa): /home/username/.ssh/name-of-remote-machine
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Follow the steps above to generate a public and private SSH key. We recommend that you select a secure password, but it is not strictly needed. The private key should never be shared with anybody and the public key is what we need to copy to the physical machine.

Copy the file /home/username/.ssh/name-of-remote-machine.pub to a USB and ask mailto:robin-engineer@ifi.uio.no (Vegard) for physical access to the machine. Once there log in to your user and copy the content of the public key into the file ~/.ssh/authorized_keys. You should now have remote access to the machine. Before leaving it is prudent to test that everything worked by using the machine you created the SSH keys on to log in remotely, ssh username@remote-url.

After setting up remote access you will still need to be connected to UiO either through Eduroam or VPN to access the machines

Configure SSH

To make SSH slightly more pleasant to work with we can create a configuration file at ~/.ssh/config which can contain additional information about remote machines. The following is a possible configuration for the Rudolph workstation.

Host rudolph
    User my-username
    HostName rdlf.duckdns.org
    IdentityFile ~/.ssh/rudolph

This allows us to ssh using the command ssh rudolph without any other configuration.

We also recommend students to install [mosh] for a better ssh experience. The usage is exactly the same as with ssh except that mosh is capable of staying connected even when roaming and through hibernation of laptops. Mosh should be installed on all workstations.

NOTE: mosh does not support X-Forwarding.

Setting up the workstations

As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.

  • For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
  • The shared (spinning) disk can stay as NTFS.
  • Ubuntu 16.04 LTS current best practice
  • TODO: Investigate Ubuntu 18.04
  • CUDA on Linux: Choose the deb (network) option - this saves time from doing patches and updates to the latest NVIDIA driver automagically.
  • CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
  • CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file following instructions from here.
    • Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, following these instructions. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.

Tensorflow

  • Install tensorflow-gpu to use GPUs.
  • Tensorflow requires specific versions of CUDA and CUDnn
    • For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
    • For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).
    • Tensorflow 1.8 requires CUDA 9.2, which in turn requires at least version 396 of the NVIDIA driver.

Ubuntu 16.04 Setup

Install Ubuntu 16.04, do not turn Secure Boot off.

  • Shrink Windows NTFS volume in Disk Management
  • Remove graphics cards
  • Install Ubuntu following these instructions
  • Install Nvidia graphics driver from PPA repository.
  • Put graphics cards back in.
  • verify that you can boot into Windows and Ubuntu
  • Ubuntu sometimes freezes on shutdown due to an I2C driver, which can be blacklisted
  • If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use this script to sign kernel modules. - if this path is taken, modules need to be signed against any new future kernels.
  • Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4

Setup SSH Server

  • The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
  • Broadly, follow these instructions to disable password access. Maybe a good idea to use fail2ban or deny hosts, but just disabling password access should be a good start.
  • Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
  • ssh-copy-id won't work if password access is turned off :-\

Setup Dynamic DNS

protocol=duckdns
password=token-from-duck-dns
chosen-host-url.duckdns.org
run_daemon="true"

  • start service: sudo service ddclient start
  • show status: sudo service ddclient status

Remote Connections

Port forwarding for Jupyter Notebook

Here’s a trick to use jupyter notebook in your local laptop browser, but to do the work on our GPU workstation, Dasher.

Login to Dasher via SSH with the port forwarding switches:

ssh -L 8888:localhost:8888 charles@dasher-robin.duckdns.org

This maps the address “localhost:8888” on dasher to port “localhost:8888” on your laptop.

Then you can run Jupyter on the ssh connection:

Jupyter notebook

When Jupyter runs it shows you a special URL for the notebook session, including a unique token. Copy this, then on your local computer, open a browser and go to:

http://localhost:8888/?token=TOKEN_TEXT_HERE

Now you’re ready to compute something difficult!

More info: Ubuntu Help on Port Forwarding

Using Tensorflow and Keras

Master student accounts do not have root (admin) access, so you should install python packages you require locally. Using Python 3 is recommended!

The recommended way to do this is with pip and a local virtualenv for your user account, there's good instructions on the Tensorflow install guide: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv

You can do this as follows (it's good Python/Unix practice to know and use virtual environments)

mkdir ~/tensorflow 
cd ~/tensorflow
virtualenv --system-site-packages -p python3 venv
source ~/tensorflow/venv/bin/activate # activates your virtual environment
pip install -U tensorflow-gpu keras matplotlib tensorflow-probability-gpu # add packages you want

The line that starts with "source" starts up your virtualenv, so you'll need to run that whenever you want to do some work in tensorflow and python. This setup is recommended because you then get to have your own version of tensorflow and nothing anybody else does can mess up your work environment. Practical!

You can test that it's work by opening Python and running some tensorflow code:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(tf.multiply(tf.constant([2.]), tf.constant([3.]))))

This should print out some stuff about using the GPUs and, for reference, the answer should be 6!

Personlige verktøy