Deep Learning Workstations

Fra Robin

(Forskjeller mellom versjoner)
Gå til: navigasjon, søk
(Ubuntu 16.04 Setup)
(Ubuntu 16.04 Setup)
Linje 69: Linje 69:
* If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use [https://gist.github.com/Garoe/74a0040f50ae7987885a0bebe5eda1aa this script to sign kernel modules.] - if this path is taken, modules need to be signed against any new future kernels.
* If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use [https://gist.github.com/Garoe/74a0040f50ae7987885a0bebe5eda1aa this script to sign kernel modules.] - if this path is taken, modules need to be signed against any new future kernels.
* Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4
* Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4
 +
 +
=== Setup SSH Server ===
 +
 +
* The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
 +
* Broadly, follow [https://askubuntu.com/questions/2271/how-to-harden-an-ssh-server these instructions] to disable password access and use fail2ban and deny hosts
 +
* Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
 +
* ssh-copy-id won't work if password access is turned off :-\

Versjonen fra 15. jan 2018 kl. 15:40

Innhold

Deep Learning Workstations

We have shared workstations for projects needing serious GPU and CPU power while retaining physical access to a computer.

Most interesting are the four Dell Alienware computers from January 2018, here are their details and responsible staff for ongoing record keeping:

  • Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113616). Justas/Zia/Weria
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:84:B7
    • ETH: 9C:30:5B:13:AF:33
  • Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1080ti (UiO: 113615). Charles/Kai/Tønnes
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:7E:D1
    • ETH: 9C:30:5B:13:C5:69
  • Aurora R7 - Intel i7 8700K (6-core), 2x Nvidia GTX1070ti (UiO: 113617). Vegard/Masterstudenter
    • hostname:
    • URL:
    • WLAN: D8:9E:F3:7A:46:08
    • ETH: 9E:30:5B:13:C5:8B
  • Area 51 R3 - AMD Threadripper 1950x (16-core), 1x Nvidia GTX1070ti (UiO: 113614). Jørgen etc.
    • hostname:
    • URL:
    • WLAN: 9C:30:5B:13:C5:71
    • ETH1: 30:9C:23:2A:EB:39
    • ETH2: 30:9C:23:2A:EB:38

We also have older workstations:

  • Deep Thinker: Intel..., 2x Nvidia GXT1080. Charles/Justas/Masterstudenter
    • hostname: deepthinker
    • url: deepthinker.onthewifi.com

Setting up the workstations

As a rule, shared systems should be able to dual-boot between Windows 10 and Ubuntu.

  • For the Dell systems, first shrink the main NTFS partition to allow an Ubuntu system partition.
  • The shared (spinning) disk can stay as NTFS.
  • Ubuntu 17.10 is ok for Deep Learning / how about for robotics applications?

Tensorflow

  • Install tensorflow-gpu to use GPUs.
  • Tensorflow requires specific versions of CUDA and CUDnn
    • For Tensorflow 1.4 (current release as of 9/1/2018): Need CUDA 8, CUDnn 6
    • For Tensorflow 1.5 (prerelease as of 9/1/2018) - works with CUDA 9, CUDnn 7 (latest).

Ubuntu 16.04 Setup

Install Ubuntu 16.04, do not turn Secure Boot off.

  • Shrink Windows NTFS volume in Disk Management
  • Remove graphics cards
  • Install Ubuntu following these instructions
  • Install Nvidia graphics driver from PPA repository.
  • Put graphics cards back in.
  • verify that you can boot into Windows and Ubuntu
  • Ubuntu sometimes freezes on shutdown due to an I2C driver, which can be blacklisted
  • If Ubuntu doesn't login and graphics are weird, probably using unsigned Nvidia driver, use this script to sign kernel modules. - if this path is taken, modules need to be signed against any new future kernels.
  • Install CUDA 8.0 and CUDnn 6.0 for Tensorflow 1.4

Setup SSH Server

  • The SSH server config (/etc/ssh/sshd_config) should be hardened to prevent brute force password attacks.
  • Broadly, follow these instructions to disable password access and use fail2ban and deny hosts
  • Regular users should copy their public key to the machine in person and add it to ~/.ssh/authorized_keys
  • ssh-copy-id won't work if password access is turned off :-\
Personlige verktøy