Meet our 128 GPU NVIDIA Cluster
A Few Technical Specs
|qty 1 HP DL380 G5 Xeon Admin Node
qty 64 HP DL160 G6 computation nodes
512 Xeon X5550 2.67GHz Processors, 8 Core per node
1.5TB of memory, 24GB of RAM per node
750GB local scratch disk space
HP InfiniBand 4X DDR Conn-X PCI-E G2 Dual Port HCA
OS Red Hat Enterprise Linux Server release 5.3
qty 32 NVIDIA Tesla S1070's
128 Tesla GPUs, 4 per S1070 1U Tesla
512GB of dedicated GPU Memory (16 per Tesla, 8 per CPU node)
The system has 3 networks
1- InfiniBand with a Voltaire InfiniBand 4X DDR Rev B 96P Switch
This network is used for MPI and NFS traffic to the SGI file server with 8TB of dictated space.
The IB network is connected to the SCI core switch and the SGI file server via a 10Gb fiber link.
2- 1Gb ethernet network
This network is used for low bandwidth access such as ssh and is used to netboot all the nodes.
3- Lights out management network
HP iLO2 network lets us reboot and manage crashed nodes.
The node are netbooted using the standard Redhat netboot node manager. This allows for easy software upgrades and also also us to easily run multiple OS version across any combination of nodes.
Currently the cluster is running interactively, meaning that any user can ssh directly to a node and can run jobs on as many nodes as they need.
In the future we are evaluating cluster management and scheduling software that includes but is not limited to:
GPU Research and Teaching Efforts
GKLEE - A concolic (concrete + symbolic) verifier plus test generator. Accepted at PPoPP 2012