Carsten Urbach

QBiG Cluster (lattice QCD in Bonn on GPUs)

The QBiG GPU cluster is funded by the DFG in the framework of the CRC 110. It consists of two parts. The most recent one QBiG-II consists of 5 nodes with 8 NVIDIA P100 cards each. It has a peak performance of about 180 TFlops in double and about 373 TFlops in single precision. QBiG-I has a peak performance of 56 TFlops in double and 168 TFlops in single precision on 48 K20m GPUs.

The fast infiniband network allows the users for multi GPU and multi-node programme execution. QBiG is connected to 190 TByte of RAID disk storage using a Lustre filesystem.

Configuration QBiG-II (lnode13-lnode17)

5 nodes with 8 NVIDIA P100 GPUs
2 Intel XEON CPUs with 14 cores plus hyperthreading each per node
768 GB of main memory per node
Switched Infiniband Network

Configuration QBiG-I (lnode01-lnode12)

12 nodes with 4 NVIDIA K20m GPUs
2 Intel XEON CPUs with 4 cores each per node
64 GByte main memory per GPU node
1 node with 32 CPU cores (Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz) with 128 GByte main memory
2 TByte scratch disk space per node
Switched Infiniband Network

CPU only nodes (lcpunode01 and lcpunode01)

In addition we provide some CPU only nodes

Access and Environment

The cluster can be accessed via the frontend node qbig.cluster.hiskp from within the HISKP VPN network only. Connect using ssh to qbig.itkp.uni-bonn.de. Every user has a directory on the frontend node in /hiskp4/username. The latter FS is a lustre FS available via infiniband on all compute nodes.

Please note that the frontend node is for compiling and development only, so please do not run production jobs on qbig interactively. A few CPU slots are available on the frontend node, which can be used.

There are two MPI libraries installed, openMPI and MVAPICH2. Both can handle Infiniband, but only the latter is compiled with GPU direct support. However, only with openMPI I managed to get hybrid MPI+openMP jobs running. MVAPICH2 is the standard you will get when invoking mpicc and mpirun. If you want to use openMPI you need to use mpicc.openmpi and mpirun.openmpi instead. Unfortunately, currently the man pages refer to MVAPICH2 only.

This means in particular that you need to recompile your application for either openMPI or MVAPICH2. Therefore, you have to compile with mpicc.openmpi if you want to run a hybrid MPI+openMP application.

File Systems

FS	size	visible on qbig	visible on nodes	RAID
/hiskp4/	195 TB	yes	yes	lustre
$SCRATCH1	~1 TB per node	no	yes	no
$SCRATCH2	~1 TB per node	no	yes	no
/qbigwork	7 TB	yes	yes	5
/qbigwork2	22 TB	yes	yes	5

For IO intensive jobs it is advisable to use SCRATCH2 in order to not intefere with the OS on the first disc.

/qbigwork and /qbigwork2 are intended for backing up important data.

Batch Queuing

Batch queuing is done using SLURM. Most important commands are sbatch for submitting a job, squeue for listing jobs in the queue and scancel for cancelling a job.

Maximal walltime allowed is currently 36 hours. Default is one hour.

The default memory requirement is set to one GB. Please specify the momory requirements as precisely as possible! The limit is checked strictly and jobs will be aborted.

Example Job Script

Single Node job

#!/bin/bash -x #SBATCH --job-name=my-job #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --output=%x.%J.out #SBATCH --error=%x.%J.out #SBATCH --time=36:00:00 #SBATCH --mail-user=me@hiskp.uni-bonn.de #SBATCH --mail-type=ALL #SBATCH --mem=1500M export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} export KMP_AFFINITY=balanced,granularity=fine,verbose cd /hiskp4/username/run-dir/ srun path-to-exec/executable cd -

Single Node job with GPUs

#!/bin/bash -x #SBATCH --job-name=gpujob #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --output=%x.%J.out #SBATCH --error=%x.%J.out #SBATCH --time=01:00:00 #SBATCH --mail-user=me@hiskp.uni-bonn.de #SBATCH --mail-type=ALL #SBATCH --gres=gpu:4 #SBATCH --mem=1G export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} export KMP_AFFINITY=balanced,granularity=fine,verbose cd /hiskp4/username/run-dir/ srun path-to-exec/executable cd -

Research Group of Carsten Urbach

Lattice QCD and Computational Physics