
|
Large Data Visualization on Distributed Memory Multi-GPU Clusters T. Fogal, H. Childs, S. Shankar, J. Krueger, R.D. Bergeron, P. Hatcher. In Proceedings of High Performance Graphics 2010, pp. 57--66. 2010.
Data sets of immense size are regularly generated on large scale computing resources, Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualized on standard workstations is now commonplace.
One solution to this problem is to employ a 'visualization cluster', a small to medium scale cluster dedicated to performing visualization and analysis of massive data sets generated on larger scale supercomputers. These clusters are designed to fit a different need than traditional supercomputers, and therefore their design mandates different hardware choices, such as increased memory and more recently, graphics processing units (GPUs). While there has been much previous work on distributed memory visualization as well as GPU visualization, there is a relative dearth of algorithms which effectively use GPUs at a large scale in a distributed memory environment. In this work, we study a common visualization technique in a GPU-accelerated, distributed memory setting, and present performance characteristics when scaling to extremely large data sets.
Full Publication |
|
 |
Assembling Large Mosaics of Electron Microscope Images using GPU. Kannan UV, M. Kim, D. Gerszewski, J.R. Anderson, M. Hall. In Proceedings of the 2009 Symposium on Application Accelerators in High Performance Computing (SAAHPC’09). 2009.
Understanding the neural circuitry of the retina requires us to map the connectivity of individual neurons in large neuronal tissue sections and analyze signal communication across processes from the electron microscopy images. One of the major bottlenecks in the critical path is the image mosaicking process where 2D slices are assembled from scanned microscopy image tiles. The problem of assembling the tiles is computationally non-trivial because of distortion of the specimen in the electron microscope due to heat and overlap between the scanned tiles. The complexity of the calculation arises from the massive size of the dataset and mathematical calculations required to calculate value of each pixel of the mosaic. We propose to use texture memory lookups to speedup the access to image tiles and data parallel computing enabled by the GPUs to accelerate this process. The proposed method results in noticeable improvements in speed of computation compared to other methods.
|
|
 |
Acceleration of Cardiac Tissue Simulation with Graphic Processing Units. D. Sato, Y. Xie, J.N. Weiss, Z. Qu, A. Garfinkel, A.R. Sanderson. In Medical and Biological Engineering and Computing, No. DOI:10.1007/s11517-0, Note: Published online Aug 5, 2009, 2009.
In this technical note we show the promise of using graphic processing units (GPUs) to accelerate simulations of electrical wave propagation in cardiac tissue, one of the more demanding computational problems in cardiology. We have found that the computational speed of two-dimensional (2D) tissue simulations with a single commercially available GPU is about 30 times faster than with a single 2.0 GHz Advanced Micro Devices (AMD) Opteron processor. We have also simulated wave conduction in the three-dimensional (3D) anatomic heart with GPUs where we found the computational speed with a single GPU is 1.6 times slower than with a 32-central processing unit (CPU) Opteron cluster. However, a cluster with two or four GPUs is faster than the CPU-based cluster. These results demonstrate that a commodity personal computer is able to perform a whole heart simulation of electrical wave conduction within times that enable the investigators to interact more easily with their simulations.
Full Publication |
|
 |
A Framework for Exploring Numerical Solutions of Advection Reaction Diffusion Equations using a GPU Based Approach. A.R. Sanderson, M.R. Meyer, R.M. Kirby, C.R. Johnson. In Journal of Computing and Visualization in Science, Vol. 12, pp. 155--170. 2009.
In this paper we describe a general purpose, graphics processing unit (GP-GPU)-based approach for solving partial differential equations (PDEs) within advection–reaction–diffusion models. The GP-GPU-based approach provides a platform for solving PDEs in parallel and can thus significantly reduce solution times over traditional CPU implementations. This allows for a more efficient exploration of various advection–reaction–diffusion models, as well as, the parameters that govern them. Although the GPU does impose limitations on the size and accuracy of computations, the PDEs describing the advection–reaction–diffusion models of interest to us fit comfortably within these constraints. Furthermore, the GPU technology continues to rapidly increase in speed, memory, and precision, thus applying these techniques to larger systems should be possible in the future. We chose to solve the PDEs using two numerical approaches: for the diffusion, a first-order explicit forward Euler solution and a semi-implicit second order Crank–Nicholson solution; and, for the advection and reaction, a first-order explicit solution. The goal of this work is to provide motivation and guidance to the application scientist interested in exploring the use of the GP-GPU computational framework in the course of their research. In this paper, we present a rigorous comparison of our GPU-based advection–reaction–diffusion code model with a CPU-based analog, finding that the GPU model out-performs the CPU implementation in one-to-one comparisons.
Full Publication |
|
 |
Feasibility of GPU-Assisted Iterative Image Reconstruction for Mobile C-ARM CT. Y. Pan, R.T. Whitaker, A. Cheryauka, D. Ferguson. In Proceedings of SPIE Medical Imaging 2009, pp. (accepted). 2009.
Computed tomography (CT) has been extensively studied and widely used for a variety of medical applications. The reconstruction of 3D images from a projection series is an important aspect of the modality. Reconstruction by filtered backprojection (FBP) is used by most manufacturers because of speed, ease of implementation, and relatively few parameters. Iterative reconstruction methods have a significant potential to provide superior performance with incomplete or noisy data, or with less than ideal geometries, such as cone-beam systems. However, iterative methods have a high computational cost, and regularization is usually required to reduce the effects of noise. The simultaneous algebraic reconstruction technique (SART) is studied in this paper, where the Feldkamp method (FDK) for filtered back projection is used as an initialization for iterative SART. Additionally, graphics hardware is utilized to increase the speed of SART implementation. Nvidia processors and compute unified device architecture (CUDA) form the platform for GPU computation. Total variation (TV) minimization is applied for the regularization of SART results. Preliminary results of SART on 3-D Shepp-Logan phantom using TV regularization and GPU computation are presented in this paper. Potential improvements of the proposed framework are also discussed.
Full Publication |
|
|
Multi-Threaded Streaming Pipeline For VTK. H.T. Vo, C.T. Silva. SCI Technical Report, No. UUSCI-2009-005, SCI Institute, University of Utah, 2009.
In this document, we describe the implementation details of our proposal on how to modify the VTK pipeline execution framework to support improved streaming and multi-threaded capabilities. We believe the functionality reported here is the best way to start adding this functionality to VTK. The plan would be to first settle on the basic functionality; after that, it should not be hard to continue adding the rest of the framework to VTK (e.g., support for streaming unstructured data structures and GPUs).
Full Publication |
|
|
|