Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2013


Y. Wan. “Fluorender, An Interactive Tool for Confocal Microscopy Data Visualization and Analysis,” Note: Ph.D. Thesis, School of Computing, University of Utah, June, 2013.

ABSTRACT

Confocal microscopy has become a popular imaging technique in biology research in recent years. It is often used to study three-dimensional (3D) structures of biological samples. Confocal data are commonly multi-channel, with each channel resulting from a different fluorescent staining. This technique also results finely detailed structures in 3D, such as neuron fibers. Despite the plethora of volume rendering techniques that have been available for many years, there is a demand from biologists for a flexible tool that allows interactive visualization and analysis of multi-channel confocal data. Together with biologists, we have designed and developed FluoRender. It incorporates volume rendering techniques such as a two-dimensional (2D) transfer function and multi-channel intermixing. Rendering results can be enhanced through tone-mappings and overlays. To facilitate analyses of confocal data, FluoRender provides interactive operations for extracting complex structures. Furthermore, we developed the Synthetic Brainbow technique, which takes advantage of the asynchronous behavior in Graphics Processing Unit (GPU) framebuffer loops and generates random colorizations for different structures in single-channel confocal data. The results from our Synthetic Brainbows, when applied to a sequence of developing cells, can then be used for tracking the movements of these cells. Finally, we present an application of FluoRender in the workflow of constructing anatomical atlases.

Keywords: confocal microscopy, visualization, software



L. Zhou, C.D. Hansen. “Interactive rendering and efficient querying for large multivariate seismic volumes on consumer level PCs,” In Proceedings of the 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), pp. 117--118. 2013.
DOI: 10.1109/LDAV.2013.6675167

ABSTRACT

We present a volume visualization method that allows interactive rendering and efficient querying of large multivariate seismic volume data on consumer level PCs. The volume rendering pipeline utilizes a virtual memory structure that supports out-of-core multivariate multi-resolution data and a GPU-based ray caster that allows interactive multivariate transfer function design. A Gaussian mixture model representation is precomputed and nearly interactive querying is achieved by testing the Gaussian functions against user defined transfer functions on the GPU in the runtime. Finally, the method has been tested on a multivariate 3D seismic dataset which is larger than the size of the main memory of the testing machine.


2012


Y.-J. Ahn, C. Hoffmann, P. Rosen. “A Note on Circle Packing,” In Journal of Zhejiang University SCIENCE C, Vol. 13, No. 8, pp. 559--564. 2012.

ABSTRACT

The problem of packing circles into a domain of prescribed topology is considered. The circles need not have equal radii. The Collins-Stephenson algorithm computes such a circle packing. This algorithm is parallelized in two different ways and its performance is reported for a triangular, planar domain test case. The implementation uses the highly parallel graphics processing unit (GPU) on commodity hardware. The speedups so achieved are discussed based on a number of experiments.

Keywords: Circle packing, Algorithm performance, Parallel computation, Graphics processing unit (GPU)



C. Brownlee, J. Patchett, L.-T. Lo, D. DeMarle, C. Mitchell, J. Ahrens, C.D. Hansen. “A Study of Ray Tracing Large-scale Scientific Data in Parallel Visualization Applications,” In Proceedings of the Eurographics Symposium on Parallel Graphics and Visualization (2012), Edited by H. Childs and T. Kuhlen and F. Marton, pp. 51--60. 2012.

ABSTRACT

Large-scale analysis and visualization is becoming increasingly important as supercomputers and their simulations produce larger and larger data. These large data sizes are pushing the limits of traditional rendering algorithms and tools thus motivating a study exploring these limits and their possible resolutions through alternative rendering algorithms . In order to better understand real-world performance with large data, this paper presents a detailed timing study on a large cluster with the widely used visualization tools ParaView and VisIt. The software ray tracer Manta was integrated into these programs in order to show that improved performance could be attained with software ray tracing on a distributed memory, GPU enabled, parallel visualization resource. Using the Texas Advanced Computing Center’s Longhorn cluster which has multi-core CPUs and GPUs with large-scale polygonal data, we find multi-core CPU ray tracing to be significantly faster than both software rasterization and hardware-accelerated rasterization in existing scientific visualization tools with large data.

Keywords: kaust, scidac



C.-S. Chiang, C. Hoffmann, P. Rosen. “A Generalized Malfatti Problem,” In Computational Geometry: Theory and Applications, Vol. 45, No. 8, pp. 425--435. 2012.

ABSTRACT

Malfatti's problem, first published in 1803, is commonly understood to ask fitting three circles into a given triangle such that they are tangent to each other, externally, and such that each circle is tangent to a pair of the triangle's sides. There are many solutions based on geometric constructions, as well as generalizations in which the triangle sides are assumed to be circle arcs. A generalization that asks to fit six circles into the triangle, tangent to each other and to the triangle sides, has been considered a good example of a problem that requires sophisticated numerical iteration to solve by computer. We analyze this problem and show how to solve it quickly.

Keywords: Malfatti's problem, circle packing, geometric constraint solving, GPU programming



A. Duchowski, M. Price, M.D. Meyer, P. Orero. “Aggregate Gaze Visualization with Real-Time Heatmaps,” In Proceedings of the ACM Symposium on Eye Tracking Research and Applications (ETRA), pp. 13--20. 2012.
DOI: 10.1145/2168556.2168558

ABSTRACT

A GPU implementation is given for real-time visualization of aggregate eye movements (gaze) via heatmaps. Parallelization of the algorithm leads to substantial speedup over its CPU-based implementation and, for the first time, allows real-time rendering of heatmaps atop video. GLSL shader colorization allows the choice of color ramps. Several luminance-based color maps are advocated as alternatives to the popular rainbow color map, considered inappropriate (harmful) for depiction of (relative) gaze distributions.



L.K. Ha, J. Krüger, J.L.D. Comba, C.T. Silva, S. Joshi. “ISP: An Optimal Out-of-Core Image-Set Processing Streaming Architecture for Parallel Heterogeneous Systems,” In IEEE Transactions on Visualization and Computer Graphics (TVCG), Vol. 18, No. 6, pp. 838--851. 2012.
DOI: 10.1109/TVCG.2012.32

ABSTRACT

Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP we introduce specially-designed streaming algorithms and data structures that provide an optimal solution for out-of-core multi-image processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.



J. Huang, W. Pei, C. Wen, G. Chen, W. Chen, H. Bao. “Output-Coherent Image-Space LIC for Surface Flow Visualization,” In Proceedings of the IEEE Pacific Visualization Symposium 2012, Korea, pp. 137--144. 2012.

ABSTRACT

Image-space line integral convolution (LIC) is a popular approach for visualizing surface vector fields due to its simplicity and high efficiency. To avoid inconsistencies or color blur during the user interactions in the image-space approach, some methods use surface parameterization or 3D volume texture for the effect of smooth transition, which often require expensive computational or memory cost. Furthermore, those methods cannot achieve consistent LIC results in both granularity and color distribution on different scales.

This paper introduces a novel image-space LIC for surface flows that preserves the texture coherence during user interactions. To make the noise textures under different viewpoints coherent, we propose a simple texture mapping technique that is local, robust and effective. Meanwhile, our approach pre-computes a sequence of mipmap noise textures in a coarse-to-fine manner, leading to consistent transition when the model is zoomed. Prior to perform LIC in the image space, the mipmap noise textures are mapped onto each triangle with randomly assigned texture coordinates. Then, a standard image-space LIC based on the projected vector fields is performed to generate the flow texture. The proposed approach is simple and very suitable for GPU acceleration. Our implementation demonstrates consistent and highly efficient LIC visualization on a variety of datasets.



A. Humphrey, Q. Meng, M. Berzins, T. Harman. “Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System,” SCI Technical Report, No. UUSCI-2012-003, SCI Institute, University of Utah, 2012.

ABSTRACT

The Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, together with a novel asynchronous task-based approach with fully automated load balancing. Uintah demonstrates excellent weak and strong scalability at full machine capacity on XSEDE resources such as Ranger and Kraken, and through the use of a hybrid memory approach based on a combination of MPI and Pthreads, Uintah now runs on up to 262k cores on the DOE Jaguar system. In order to extend Uintah to heterogeneous systems, with ever-increasing CPU core counts and additional onnode GPUs, a new dynamic CPU-GPU task scheduler is designed and evaluated in this study. This new scheduler enables Uintah to fully exploit these architectures with support for asynchronous, outof- order scheduling of both CPU and GPU computational tasks. A new runtime system has also been implemented with an added multi-stage queuing architecture for efficient scheduling of CPU and GPU tasks. This new runtime system automatically handles the details of asynchronous memory copies to and from the GPU and introduces a novel method of pre-fetching and preparing GPU memory prior to GPU task execution. In this study this new design is examined in the context of a developing, hierarchical GPUbased ray tracing radiation transport model that provides Uintah with additional capabilities for heat transfer and electromagnetic wave propagation. The capabilities of this new scheduler design are tested by running at large scale on the modern heterogeneous systems, Keeneland and TitanDev, with up to 360 and 960 GPUs respectively. On these systems, we demonstrate significant speedups per GPU against a standard CPU core for our radiation problem.

Keywords: csafe, uintah



A. Humphrey, Q. Meng, M. Berzins, T. Harman. “Radiation Modeling Using the Uintah Heterogeneous CPU/GPU Runtime System,” In Proceedings of the first conference of the Extreme Science and Engineering Discovery Environment (XSEDE'12), Association for Computing Machinery, 2012.
DOI: 10.1145/2335755.2335791

ABSTRACT

The Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, together with a novel asynchronous task-based approach with fully automated load balancing. Uintah demonstrates excellent weak and strong scalability at full machine capacity on XSEDE resources such as Ranger and Kraken, and through the use of a hybrid memory approach based on a combination of MPI and Pthreads, Uintah now runs on up to 262k cores on the DOE Jaguar system. In order to extend Uintah to heterogeneous systems, with ever-increasing CPU core counts and additional onnode GPUs, a new dynamic CPU-GPU task scheduler is designed and evaluated in this study. This new scheduler enables Uintah to fully exploit these architectures with support for asynchronous, outof- order scheduling of both CPU and GPU computational tasks. A new runtime system has also been implemented with an added multi-stage queuing architecture for efficient scheduling of CPU and GPU tasks. This new runtime system automatically handles the details of asynchronous memory copies to and from the GPU and introduces a novel method of pre-fetching and preparing GPU memory prior to GPU task execution. In this study this new design is examined in the context of a developing, hierarchical GPUbased ray tracing radiation transport model that provides Uintah with additional capabilities for heat transfer and electromagnetic wave propagation. The capabilities of this new scheduler design are tested by running at large scale on the modern heterogeneous systems, Keeneland and TitanDev, with up to 360 and 960 GPUs respectively. On these systems, we demonstrate significant speedups per GPU against a standard CPU core for our radiation problem.

Keywords: Uintah, hybrid parallelism, scalability, parallel, adaptive, GPU, heterogeneous systems, Keeneland, TitanDev



M. Kim, G. Chen, C.D. Hansen. “Dynamic particle system for mesh extraction on the GPU,” In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, London, England, GPGPU-5, ACM, New York, NY, USA pp. 38--46. 2012.
ISBN: 978-1-4503-1233-2
DOI: 10.1145/2159430.215943

ABSTRACT

Extracting isosurfaces represented as high quality meshes from three-dimensional scalar fields is needed for many important applications, particularly visualization and numerical simulations. One recent advance for extracting high quality meshes for isosurface computation is based on a dynamic particle system. Unfortunately, this state-of-the-art particle placement technique requires a significant amount of time to produce a satisfactory mesh. To address this issue, we study the parallelism property of the particle placement and make use of CUDA, a parallel programming technique on the GPU, to significantly improve the performance of particle placement. This paper describes the curvature dependent sampling method used to extract high quality meshes and describes its implementation using CUDA on the GPU.

Keywords: CUDA, GPGPU, particle systems, volumetric data mesh extraction



S. Liu, J.A. Levine, P.-T. Bremer, V. Pascucci. “Gaussian Mixture Model Based Volume Visualization,” In Proceedings of the IEEE Large-Scale Data Analysis and Visualization Symposium 2012, Note: Received Best Paper Award, pp. 73--77. 2012.
DOI: 10.1109/LDAV.2012.6378978

ABSTRACT

Representing uncertainty when creating visualizations is becoming more indispensable to understand and analyze scientific data. Uncertainty may come from different sources, such as, ensembles of experiments or unavoidable information loss when performing data reduction. One natural model to represent uncertainty is to assume that each position in space instead of a single value may take on a distribution of values. In this paper we present a new volume rendering method using per voxel Gaussian mixture models (GMMs) as the input data representation. GMMs are an elegant and compact way to drastically reduce the amount of data stored while still enabling realtime data access and rendering on the GPU. Our renderer offers efficient sampling of the data distribution, generating renderings of the data that flicker at each frame to indicate high variance. We can accumulate samples as well to generate still frames of the data, which preserve additional details in the data as compared to either traditional scalar indicators (such as a mean or a single nearest neighbor down sample) or to fitting the data with only a single Gaussian per voxel. We demonstrate the effectiveness of our method using ensembles of climate simulations and MRI scans as well as the down sampling of large scalar fields as examples.

Keywords: Uncertainty Visualization, Volume Rendering, Gaussian Mixture Model, Ensemble Visualization



Q. Meng, A. Humphrey, M. Berzins. “The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System,” In Digital Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, Note: SC’12 –2nd International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, WOLFHPC 2012, pp. 2441--2448. 2012.
DOI: 10.1109/SCC.2012.6674233

ABSTRACT

The development of a new unified, multi-threaded runtime system for the execution of asynchronous tasks on heterogeneous systems is described in this work. These asynchronous tasks arise from the Uintah framework, which was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah has a clear separation between its MPI-free user-coded tasks and its runtime system that ensures these tasks execute efficiently. This separation also allows for complete isolation of the application developer from the complexities involved with the parallelism Uintah provides. While we have designed scalable runtime systems for large CPU core counts, the emergence of heterogeneous systems, with additional on-node accelerators and co-processors presents additional design challenges in terms of effectively utilizing all computational resources on-node and managing multiple levels of parallelism. Our work addresses these challenges for Uintah by the development of new hybrid runtime system and Unified multi-threaded MPI task scheduler, enabling Uintah to fully exploit current and emerging architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. This design coupled with an approach that uses MPI to communicate between nodes, a shared memory model on-node and the use of novel lock-free data structures, has made it possible for Uintah to achieve excellent scalability for challenging fluid-structure problems using adaptive mesh refinement on as many as 256K cores on the DoE Jaguar XK6 system. This design has also demonstrated an ability to run capability jobs on the heterogeneous systems, Keeneland and TitanDev. In this work, the evolution of Uintah and its runtime system is examined in the context of our new Unified multi-threaded scheduler design. The performance of the Unified scheduler is also tested against previous Uintah scheduler and runtime designs over a range of processor core and GPU counts.



B. Nelson, E. Liu, R.M. Kirby, R. Haimes. “ElVis: A System for the Accurate and Interactive Visualization of High-Order Finite Element Solutions,” In IEEE Transactions on Visualization and Computer Graphics (TVCG), Vol. 18, No. 12, pp. 2325--2334. Dec, 2012.
DOI: 10.1109/TVCG.2012.218

ABSTRACT

This paper presents the Element Visualizer (ElVis), a new, open-source scientific visualization system for use with high-order finite element solutions to PDEs in three dimensions. This system is designed to minimize visualization errors of these types of fields by querying the underlying finite element basis functions (e.g., high-order polynomials) directly, leading to pixel-exact representations of solutions and geometry. The system interacts with simulation data through runtime plugins, which only require users to implement a handful of operations fundamental to finite element solvers. The data in turn can be visualized through the use of cut surfaces, contours, isosurfaces, and volume rendering. These visualization algorithms are implemented using NVIDIA's OptiX GPU-based ray-tracing engine, which provides accelerated ray traversal of the high-order geometry, and CUDA, which allows for effective parallel evaluation of the visualization algorithms. The direct interface between ElVis and the underlying data differentiates it from existing visualization tools. Current tools assume the underlying data is composed of linear primitives; high-order data must be interpolated with linear functions as a result. In this work, examples drawn from aerodynamic simulations-high-order discontinuous Galerkin finite element solutions of aerodynamic flows in particular-will demonstrate the superiority of ElVis' pixel-exact approach when compared with traditional linear-interpolation methods. Such methods can introduce a number of inaccuracies in the resulting visualization, making it unclear if visual artifacts are genuine to the solution data or if these artifacts are the result of interpolation errors. Linear methods additionally cannot properly visualize curved geometries (elements or boundaries) which can greatly inhibit developers' debugging efforts. As we will show, pixel-exact visualization exhibits none of these issues, removing the visualization scheme as a source of - ncertainty for engineers using ElVis.


2011


L.K. Ha, J. Krüger, J. Comba, S. Joshi, C.T. Silva. “Optimal Multi-Image Processing Streaming Framework on Parallel Heterogeneous Systems,” In Proceedings of Eurographics Symposium on Parallel Graphics and Visualization 2011, Note: Awarded Best Paper!, pp. 1--10. 2011.
DOI: 10.2312/EGPGV/EGPGV11/001-010

ABSTRACT

Atlas construction is an important technique in medical image analysis that plays a central role in understanding the variability of brain anatomy. The construction often requires applying image processing operations to multiple images (often hundreds of volumetric datasets), which is challenging in computational power as well as memory requirements. In this paper we introduce MIP, a Multi-Image Processing streaming framework to harness the processing power of heterogeneous CPU/GPU systems. In MIP we introduce specially designed streaming algorithms and data structures that provides an optimal solution for out-of-core multi-image processing problems both in terms of memory usage and computational efficiency. MIP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the MIP out-of-core solution could achieve the same performance as the in-core solution. We demonstrate the efficiency of the MIP framework on synthetic and real datasets.



L.K. Ha, J. Krüger, S. Joshi, C.T. Silva. “Multi-scale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs,” Vol. 1, Ch. 10, Morgan Kaufmann, pp. 42. 2011.

ABSTRACT

In this chapter, we present a high performance multi-scale 3D image processing framework to exploit the parallel processing power of multiple graphic processing units (Multi-GPUs) for medical image analysis. We developed GPU algorithms and data structures that can be applied to a wide range of 3D image processing applications and efficiently exploit the computational power and massive bandwidth offered by modern GPUs. Our framework helps scientists solve computationally intensive problems which previously required super computing power. To demonstrate the effectiveness of our framework and to compare to existing techniques, we focus our discussions on atlas construction - the application of understanding the development of the brain and the progression of brain diseases.



B. Nelson, R. Haimes, R.M. Kirby. “GPU-Based Interactive Cut-Surface Extraction From High-0rder Finite Element Fields,” In IEEE Transactions on Visualization and Computer Graphics (IEEE Visualization Issue), Vol. 17, No. 12, pp. 1803--1811. 2011.

ABSTRACT

We present a GPU-based ray-tracing system for the accurate and interactive visualization of cut-surfaces through 3D simulations of physical processes created from spectral/hp high-order finite element methods. When used by the numerical analyst to debug the solver, the ability for the imagery to precisely reflect the data is critical. In practice, the investigator interactively selects from a palette of visualization tools to construct a scene that can answer a query of the data. This is effective as long as the implicit contract of image quality between the individual and the visualization system is upheld. OpenGL rendering of scientific visualizations has worked remarkably well for exploratory visualization for most solver results. This is due to the consistency between the use of first-order representations in the simulation and the linear assumptions inherent in OpenGL (planar fragments and color-space interpolation). Unfortunately, the contract is broken when the solver discretization is of higher-order. There have been attempts to mitigate this through the use of spatial adaptation and/or texture mapping. These methods do a better job of approximating what the imagery should be but are not exact and tend to be view-dependent. This paper introduces new rendering mechanisms that specifically deal with the kinds of native data generated by high-order finite element solvers. The exploratory visualization tools are reassessed and cast in this system with the focus on image accuracy. This is accomplished in a GPU setting to ensure interactivity.



S. Philip, B. Summa, P-T Bremer, V. Pascucci. “Hybrid CPU-GPU Solver for Gradient Domain Processing of Massive Images,” In Proceedings of 2011 International Conference on Parallel and Distributed Systems (ICPADS), pp. 244--251. 2011.

ABSTRACT

Gradient domain processing is a computationally expensive image processing technique. Its use for processing massive images, giga or terapixels in size, can take several hours with serial techniques. To address this challenge, parallel algorithms are being developed to make this class of techniques applicable to the largest images available with running times that are more acceptable to the users. To this end we target the most ubiquitous form of computing power available today, which is small or medium scale clusters of commodity hardware. Such clusters are continuously increasing in scale, not only in the number of nodes, but also in the amount of parallelism available within each node in the form of multicore CPUs and GPUs. In this paper we present a hybrid parallel implementation of gradient domain processing for seamless stitching of gigapixel panoramas that utilizes MPI, threading and a CUDA based GPU component. We demonstrate the performance and scalability of our implementation by presenting results from two GPU clusters processing two large data sets.



H.T. Vo, C.T. Silva, L.F. Scheidegger, V. Pascucci. “Simple and Efficient Mesh Layout with Space-Filling Curves,” In Journal of Graphics, GPU, and Game Tools, pp. 25--39. 2011.
ISSN: 2151-237X


2010


T. Fogal, H. Childs, S. Shankar, J. Krüger, R.D. Bergeron, P. Hatcher. “Large Data Visualization on Distributed Memory Multi-GPU Clusters,” In Proceedings of High Performance Graphics 2010, pp. 57--66. 2010.