Deep Learning Workstations

Fra Robin

(Forskjeller mellom versjoner)
Gå til: navigasjon, søk
(Tensorflow)
(Ubuntu 16.04 Setup)
Linje 136: Linje 136:
More info: [https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding Ubuntu Help on Port Forwarding]
More info: [https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding Ubuntu Help on Port Forwarding]
 +
 +
==== Using Tensorflow and Keras ====
 +
 +
Master student accounts do not have root (admin) access, so you should install python packages you require locally. Using Python 3 is recommended!
 +
 +
The recommended way to do this is with pip and a local virtualenv for your user account, there's good instructions on the Tensorflow install guide: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv
 +
 +
You can do this as follows (it's good Python/Unix practice to know and use virtual environments)
 +
 +
<nowiki>
 +
mkdir ~/tensorflow
 +
cd ~/tensorflow
 +
virtualenv --system-site-packages -p python3 venv
 +
source ~/tensorflow/venv/bin/activate # activates your virtual environment
 +
pip install -U tensorflow-gpu keras matplotlib tensorflow-probability-gpu # add packages you want
 +
</nowiki>
 +
 +
You can test that it's work by opening Python and running some tensorflow code:
 +
 +
<nowiki>
 +
import tensorflow as tf
 +
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
 +
print(sess.run(tf.multiply(tf.constant([2.]), tf.constant([3.]))))
 +
</nowiki>
 +
 +
This should print out some stuff about using the GPUs and, for reference, the answer should be 6!

Versjonen fra 20. aug 2018 kl. 12:46

Innhold

Deep Learning Workstations

We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). Justas/Zia/Weria
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113615). Charles/Kai/Tønnes
    • hostname: dasher
    • URL: dasher-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). Vegard/Masterstudenter
    • hostname: dunder
    • URL: mscdeeplearning.duckdns.org
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti (UiO: 113614). Jørgen etc.
    • hostname: rudolph
    • URL: rdlf.duckdns.org
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • url: deepthinker.onthewifi.com
  • Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
    • Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.

Setting up the workstations

As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.

  • For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
  • The shared (spinning) disk can stay as NTFS.
  • Ubuntu 16.04 LTS current best practice
  • TODO: Investigate Ubuntu 18.04
  • CUDA on Linux: Choose the deb (network) option - this saves time from doing patches and updates to the latest NVIDIA driver automagically.
  • CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
  • CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file following instructions from here.
    • Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, following these instructions. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.

Tensorflow

  • Install tensorflow-gpu to use GPUs.
  • Tensorflow requires specific versions of CUDA and CUDnn
    • For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
    • For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).
    • Tensorflow 1.8 requires CUDA 9.2, which in turn requires at least version 396 of the NVIDIA driver.

Ubuntu 16.04 Setup

Install Ubuntu 16.04, do not turn Secure Boot off.

  • Shrink Windows NTFS volume in Disk Management
  • Remove graphics cards
  • Install Ubuntu following these instructions
  • Install Nvidia graphics driver from PPA repository.
  • Put graphics cards back in.
  • verify that you can boot into Windows and Ubuntu
  • Ubuntu sometimes freezes on shutdown due to an I2C driver, which can be blacklisted
  • If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use this script to sign kernel modules. - if this path is taken, modules need to be signed against any new future kernels.
  • Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4

Setup SSH Server

  • The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
  • Broadly, follow these instructions to disable password access. Maybe a good idea to use fail2ban or deny hosts, but just disabling password access should be a good start.
  • Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
  • ssh-copy-id won't work if password access is turned off :-\

Setup Dynamic DNS

protocol=duckdns
password=token-from-duck-dns
chosen-host-url.duckdns.org
run_daemon="true"

  • start service: sudo service ddclient start
  • show status: sudo service ddclient status

Remote Connections

Port forwarding for Jupyter Notebook

Here’s a trick to use jupyter notebook in your local laptop browser, but to do the work on our GPU workstation, Dasher.

Login to Dasher via SSH with the port forwarding switches:

ssh -L 8888:localhost:8888 charles@dasher-robin.duckdns.org

This maps the address “localhost:8888” on dasher to port “localhost:8888” on your laptop.

Then you can run Jupyter on the ssh connection:

Jupyter notebook

When Jupyter runs it shows you a special URL for the notebook session, including a unique token. Copy this, then on your local computer, open a browser and go to:

http://localhost:8888/?token=TOKEN_TEXT_HERE

Now you’re ready to compute something difficult!

More info: Ubuntu Help on Port Forwarding

Using Tensorflow and Keras

Master student accounts do not have root (admin) access, so you should install python packages you require locally. Using Python 3 is recommended!

The recommended way to do this is with pip and a local virtualenv for your user account, there's good instructions on the Tensorflow install guide: https://www.tensorflow.org/install/install_linux#InstallingVirtualenv

You can do this as follows (it's good Python/Unix practice to know and use virtual environments)

mkdir ~/tensorflow cd ~/tensorflow virtualenv --system-site-packages -p python3 venv source ~/tensorflow/venv/bin/activate # activates your virtual environment pip install -U tensorflow-gpu keras matplotlib tensorflow-probability-gpu # add packages you want

You can test that it's work by opening Python and running some tensorflow code:

import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) print(sess.run(tf.multiply(tf.constant([2.]), tf.constant([3.]))))

This should print out some stuff about using the GPUs and, for reference, the answer should be 6!

Personlige verktøy