Deep Learning Workstations

Fra Robin

Revisjon per 20. aug 2018 kl. 12:48 av Charlepm (Diskusjon | bidrag)
Gå til: navigasjon, søk

Innhold

Deep Learning Workstations

We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). Justas/Zia/Weria
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113615). Charles/Kai/Tønnes
    • hostname: dasher
    • URL: dasher-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). Vegard/Masterstudenter
    • hostname: dunder
    • URL: mscdeeplearning.duckdns.org
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti (UiO: 113614). Jørgen etc.
    • hostname: rudolph
    • URL: rdlf.duckdns.org
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • url: deepthinker.onthewifi.com
  • Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
    • Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.

Setting up the workstations

As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.

  • For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
  • The shared (spinning) disk can stay as NTFS.
  • Ubuntu 16.04 LTS current best practice
  • TODO: Investigate Ubuntu 18.04
  • CUDA on Linux: Choose the deb (network) option - this saves time from doing patches and updates to the latest NVIDIA driver automagically.
  • CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
  • CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file following instructions from here.
    • Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, following these instructions. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.

Tensorflow

  • Install tensorflow-gpu to use GPUs.
  • Tensorflow requires specific versions of CUDA and CUDnn
    • For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
    • For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).
    • Tensorflow 1.8 requires CUDA 9.2, which in turn requires at least version 396 of the NVIDIA driver.

Ubuntu 16.04 Setup

Install Ubuntu 16.04, do not turn Secure Boot off.

  • Shrink Windows NTFS volume in Disk Management
  • Remove graphics cards
  • Install Ubuntu following these instructions
  • Install Nvidia graphics driver from PPA repository.
  • Put graphics cards back in.
  • verify that you can boot into Windows and Ubuntu
  • Ubuntu sometimes freezes on shutdown due to an I2C driver, which can be blacklisted
  • If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use this script to sign kernel modules. - if this path is taken, modules need to be signed against any new future kernels.
  • Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4

Setup SSH Server

  • The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
  • Broadly, follow these instructions to disable password access. Maybe a good idea to use fail2ban or deny hosts, but just disabling password access should be a good start.
  • Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
  • ssh-copy-id won't work if password access is turned off :-\

Setup Dynamic DNS

protocol=duckdns
password=token-from-duck-dns
chosen-host-url.duckdns.org
run_daemon="true"

  • start service: sudo service ddclient start
  • show status: sudo service ddclient status

Remote Connections

Port forwarding for Jupyter Notebook

Here’s a trick to use jupyter notebook in your local laptop browser, but to do the work on our GPU workstation, Dasher.

Login to Dasher via SSH with the port forwarding switches:

ssh -L 8888:localhost:8888 charles@dasher-robin.duckdns.org

This maps the address “localhost:8888” on dasher to port “localhost:8888” on your laptop.

Then you can run Jupyter on the ssh connection:

Jupyter notebook

When Jupyter runs it shows you a special URL for the notebook session, including a unique token. Copy this, then on your local computer, open a browser and go to:

http://localhost:8888/?token=TOKEN_TEXT_HERE

Now you’re ready to compute something difficult!

More info: Ubuntu Help on Port Forwarding

Using Tensorflow and Keras

Master student accounts do not have root (admin) access, so you should install python packages you require locally. Using Python 3 is recommended!

The recommended way to do this is with pip and a local virtualenv for your user account, there's good instructions on the Tensorflow install guide: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv

You can do this as follows (it's good Python/Unix practice to know and use virtual environments)

mkdir ~/tensorflow 
cd ~/tensorflow
virtualenv --system-site-packages -p python3 venv
source ~/tensorflow/venv/bin/activate # activates your virtual environment
pip install -U tensorflow-gpu keras matplotlib tensorflow-probability-gpu # add packages you want

The line that starts with "source" starts up your virtualenv, so you'll need to run that whenever you want to do some work in tensorflow and python. This setup is recommended because you then get to have your own version of tensorflow and nothing anybody else does can mess up your work environment. Practical!

You can test that it's work by opening Python and running some tensorflow code:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(tf.multiply(tf.constant([2.]), tf.constant([3.]))))

This should print out some stuff about using the GPUs and, for reference, the answer should be 6!

Personlige verktøy