Infrastructures

The Lab

Design Goals

The designing of the lab was done through consulatation with the groups interested in using it. This consultation revealed that the most important design aims of the lab will be maximizing inter-node bandwidth, minimizing the inter-node latency, and having a reasonably large graphics memory per GPU.

Keeping these in mind, thorough market research provided us with having acquiring 1 head node machine, 3 smaller performance development machines, and 4 stronger cluster machines as an optimal solution.

The third Intel Phi accelerated node and the Infiniband interconnect will be installed once maintenance in the server room is complete.

Terms of use

While we do not charge our users through any means, the upkeep of the lab is ensured by the number of publications that have been enabled by using the aforementioned facilities. Therefore we require all users to mention the "HAS Wigner RCP - GPU-Lab" in the acknowledgements.

Usage

Presently all machines can be accesed directly from within the institute using SSH. From outside the institute, only the head node is accessible.

IMPORTANT: for security reasons, the head node can only be accessed on the non-standard port 2222 (as opposed to port 22). Should anyone try to access on any other port, he/she will have to wait 15 minutes to try again.

Recommended workflow

  1. Login to the head node and/or any development machine.
  2. Code development under shared home folder.
  3. Build application (for larg applications, the head node is strongly recommended).
  4. Test.
  5. In case of stable application, login to any of the cluster machines.
  6. Run the application.

Useful commands

The 'top' command can be used on any machine to see which users are currently running programs.

The GPU drivers can provide current load and thermal information. On NVIDIA machines this can be done using the 'nvidia-smi' command, while on AMD machines the 'aticonfig --adapter=ALL' command must be used, provided we also issue the '--odgc' (OverDrive Get Clock) or the '--odgt' (OverDrive Get Temperature) switches (nomen est omen). The 'aticonfig' comamnd can only be used once we issue 'export DISPLAY=:0', thus in shells that have X-forwarding enabled it cannot be used.

Should the user wish to see these informations in an interactive manner, then one can monitor GPU load and temperatures by "sacrificing" a shell by sandwiching the above commands with the 'watch' application. Eg: watch -n 0,5 'aticonfig --adapter=ALL --odgc'

Equipment

Hierarchia

Head node

The machines of the cluster are only accessible directly from within the campus. Other machines are only accessible through the head node. The head node serves multiple pourposes (such as serving this page), but most importantly it is a build server for large applications, serves a common home directory, and can serve as the head node to cluster-parallel applications. The head node is equipped with AMD GPU cards, however accelerating applications is not it's main pourpose.

Opteron 1x Single Precision Double Precision Tot SP Tot DP
Barebone: ASUS RS924A-E6/RS8
Processor: 4*AMD Opteron™ 6376 166.4 166.4 665.6 665.6
RAM: 4*32GB 1333MHz DDR3L ECC Reg CL9 DIMM
GPU: 2*AMD Radeon™ R9 270X, 4096MB GDDR5 2560 160 5120 320

Development machines

One of the machines is equipped with AMD graphics cards, thus it is primarily meant for OpenCL programming, while the other is is installed with NVIDIA Tesla cards, thus it can be programmed using either CUDA or OpenCL.

Radeon 1x Single Precision Double Precision Tot SP Tot DP
Alaplap: ASUS P6T6 WS Revolution
Processor: Intel® Core™ i7-920 Processor 89.6 89.6 89.6 89.6
RAM: 6*2GB Kingston DDR3 1333MHz ECC
GPU: 2*AMD Radeon R9 270X Graphics 2560 160 5120 320
Tesla 1x Single Precision Double Precision Tot SP Tot DP
Alaplap: ASUS P6T6 WS Revolution
Processor: Intel® Core™ i7-920 Processor 89.6 89.6 89.6 89.6
RAM: 3*2GB Kingston DDR3 1333MHz ECC
GPU: 2*NVIDIA GeForce GTX 980 Graphics 4612 144 9224 288
Phi. 1x Single Precision Double Precision Tot SP Tot DP
Alaplap: ASUS P9X79 WS
Processor: Intel® Xeon® Processor E5-1650v2 168 168 168 168
RAM: 4*8GB Kingston DDR3 1600MHz ECC
MIC: 2*Intel® Xeon® Phi Co-Processor 3120A 2006 1003 4012 2006
Cluster 4x Single Precision Double Precision Tot SP Tot DP
Barebone: ASUS ESC4000FDR/G2
Processor: 2*Intel® Xeon® E5-2650 166.4 166.4 1331.2 1331.2
RAM: 4*8GB Kingston DDR3 1600MHz ECC
GPU: 4*AMD Radeon HD7970 4301 1075 68816 17200

Computing Capacities

Floating Point 32 Floating Point 64
Total CPU 2.344 2.344
Total GPU 88.28 18.128
Total MIC 4.012 2.006

Software

Principles

All of the machines inside the GPU-Lab share the same operating system, compilers and libraries, making them binary compatible. All machines have the compilers installed along with the SDKs appropriate to the GPUs with matching FFT and linear algebra libraries. For the capabilities and usage of the FFT and BLAS libraries consult the respective documentations. We recommend even trained GPGPU programmers to use these libraries, as often they are specifically tuned for each target architecture, or simply they have substantially more thought put into than a custom-made replacement.

Should the need arise for any missing software, the administrators will try their best to address the issue. We do not intend on changing the operating system.

Versions

OS/Application/Library Version
Ubuntu Server 64-bit 16.04
NVIDIA Display Driver 375.39
AMD APP SDK 3.0
AMD APP ML (clAmdFFT & clAmdBLAS) 1.10
CUDA Toolkit (cuFFT & cuBLAS) 7.0
GCC 5.3.1
Clang 3.8
OpenMPI 1.6
Thrust 1.6
VirtualGL 2.3.3