Deep Learning Workstations

Fra Robin

(Forskjeller mellom versjoner)
Gå til: navigasjon, søk
(Deep Learning Workstations)
(Setting up the workstations)
Linje 55: Linje 55:
* CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
* CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
* CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file [https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77 following instructions from here].
* CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file [https://blog.kovalevskyi.com/multiple-version-of-cuda-libraries-on-the-same-machine-b9502d50ae77 following instructions from here].
 +
** Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, [https://medium.com/@zhanwenchen/install-cuda-and-cudnn-for-tensorflow-gpu-on-ubuntu-79306e4ac04e following these instructions]. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.
== Tensorflow ==
== Tensorflow ==

Versjonen fra 28. jun 2018 kl. 12:15

Innhold

Deep Learning Workstations

We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). Justas/Zia/Weria
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113615). Charles/Kai/Tønnes
    • hostname: dasher
    • URL: dasher-robin.duckdns.org
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Alienware Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). Vegard/Masterstudenter
    • hostname: dunder
    • URL: mscdeeplearning.duckdns.org
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Alienware Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti (UiO: 113614). Jørgen etc.
    • hostname: rudolph
    • URL: rdlf.duckdns.org
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker (2016): (Fractal Design enclosure) Intel i7 6700K (4-core), 2x Nvidia GTX1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • url: deepthinker.onthewifi.com
  • Mac Pro (2010): Intel Xeon (8-core), 1x Nvidia GTX1060 (+ ATI 5770 1GB, not useful for DL)... Justas etc.
    • Good machine for quick tests. GPU is good for testing DL but only has 3GB memory.

Setting up the workstations

As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.

  • For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
  • The shared (spinning) disk can stay as NTFS.
  • Ubuntu 16.04 LTS current best practice
  • TODO: Investigate Ubuntu 18.04
  • CUDA on Linux: Choose the deb (network) option - this saves time from doing patches and updates to the latest NVIDIA driver automagically.
  • CUDA Version: Tensorflow 1.6 is built against 9.0, Latest version is 9.1 - be careful.
  • CPM is installing 9.1 via network deb (also to get Nvidia driver), then 9.0 via run file following instructions from here.
    • Kai upgraded Dasher to Tensorflow 1.8 with CUDA 9.2, following these instructions. Note that for CUDA 9.2 to work, you need NVIDIA driver 396 or above.

Tensorflow

  • Install tensorflow-gpu to use GPUs.
  • Tensorflow requires specific versions of CUDA and CUDnn
    • For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
    • For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).

Ubuntu 16.04 Setup

Install Ubuntu 16.04, do not turn Secure Boot off.

  • Shrink Windows NTFS volume in Disk Management
  • Remove graphics cards
  • Install Ubuntu following these instructions
  • Install Nvidia graphics driver from PPA repository.
  • Put graphics cards back in.
  • verify that you can boot into Windows and Ubuntu
  • Ubuntu sometimes freezes on shutdown due to an I2C driver, which can be blacklisted
  • If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use this script to sign kernel modules. - if this path is taken, modules need to be signed against any new future kernels.
  • Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4

Setup SSH Server

  • The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
  • Broadly, follow these instructions to disable password access. Maybe a good idea to use fail2ban or deny hosts, but just disabling password access should be a good start.
  • Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
  • ssh-copy-id won't work if password access is turned off :-\

Setup Dynamic DNS

protocol=duckdns
password=token-from-duck-dns
chosen-host-url.duckdns.org
run_daemon="true"

  • start service: sudo service ddclient start
  • show status: sudo service ddclient status

Remote Connections

Port forwarding for Jupyter Notebook

Here’s a trick to use jupyter notebook in your local laptop browser, but to do the work on our GPU workstation, Dasher.

Login to Dasher via SSH with the port forwarding switches:

ssh -L 8888:localhost:8888 charles@dasher-robin.duckdns.org

This maps the address “localhost:8888” on dasher to port “localhost:8888” on your laptop.

Then you can run Jupyter on the ssh connection:

Jupyter notebook

When Jupyter runs it shows you a special URL for the notebook session, including a unique token. Copy this, then on your local computer, open a browser and go to:

http://localhost:8888/?token=TOKEN_TEXT_HERE

Now you’re ready to compute something difficult!

More info: Ubuntu Help on Port Forwarding

Personlige verktøy