Benchmarks on the DLWSs

From Robin

(Difference between revisions)

Shinwa (Talk | contribs)
(upload content)

Current revision as of 14:48, 14 December 2022

ROBIN GPU Benchmarking

We performed a benchmark on the deep learning workstations to assess their performance in terms of training throughput. We have documented the process for reproducibility.

Methodology

Training throughput is a better performance measure for GPUs when training deep learning algorithms than e.g. FLOPS, as per Lambda Labs. Since we used image data, the unit is [images/sec].

Tested Environment

OS: Red Hat Enterprise Linux release 8.7 (Ootpa)
TensorFlow version: 2.4.1
CUDA Version 11.7
CUDNN Version 7.X.X

Benchmarking tool

We used a script that allows us to tweak several parameters:

Toggle XLA
Number of GPUs
Number of Batches
Number of Runs
Model type (ResNet50, ResNet152, AlexNet, Inceptionv3, Inceptionv4, VGG-16)
Precision (floating point 16 / 32)
Inference / training mode

Dataset

The code also allows us to use synthetic or real data. Synthetic data is composed of images of random pixel colors generated directly on the GPU.

Parameters for this benchmark

XLA disabled
1 GPU
100 batches
1 run
ResNet50 model
fp32 precision
training
ImageNet (synthetic data)

Specifically, the command used is:

./batch_benchmark.sh 1 1 1 100 10 config/config_resnet50_replicated_fp32_train_syn

Results

Caption
Workstation Name	Rudolph		Dunder	Dancer	Vixen
GPU	NVIDIA GeForce GTX 1080 Ti		NVIDIA GeForce GTX 3090	NVIDIA GeForce GTX 3090	NVIDIA GeForce GTX 3070
RAM size [MB]	10216		22378	22373	7144
Compute Capability	6.1		8.6	8.6	8.6
Batch size	56	112	112	112	40	112
Throughput [images/sec]	211.78	213.38	532.09	531.04	285.59	OOM*

*OOM = out of memory error

The trend we see is that the larger the RAM and the higher the compute capability, the higher the throughput. This is exactly what we expect!

Notes

The default batch size is automatically determined by the script, as a function of the available GPU RAM size and the number of tunable parameters in the chosen model. For more details about choosing batch sizes, see here
It would be interesting to see how much the XLA option increases the training throughput.

Benchmarks on the DLWSs

From Robin

Current revision as of 14:48, 14 December 2022

Contents

ROBIN GPU Benchmarking

Methodology

Tested Environment

Benchmarking tool

Dataset

Parameters for this benchmark

Results

Notes

Views

Personal tools

Front page

Navigation

Search

Toolbox