Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Large scale visualization on the Powerwall.
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.


martin

Martin Berzins

Parallel Computing
GPUs
mike

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs
pascucci

Valerio Pascucci

Scientific Data Management
chris

Chris Johnson

Problem Solving Environments
ross

Ross Whitaker

GPUs
chuck

Chuck Hansen

GPUs
       

Scientific Computing Project Sites:


Publications in Scientific Computing:


Shared-Memory Parallel Computation of Morse-Smale Complexes with Improved Accuracy
A. Gyulassy, P.-T. Bremer, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, Vol. 25, No. 1, IEEE, pp. 1183--1192. Jan, 2019.
DOI: 10.1109/tvcg.2018.2864848

Topological techniques have proven to be a powerful tool in the analysis and visualization of large-scale scientific data. In particular, the Morse-Smale complex and its various components provide a rich framework for robust feature definition and computation. Consequently, there now exist a number of approaches to compute Morse-Smale complexes for large-scale data in parallel. However, existing techniques are based on discrete concepts which produce the correct topological structure but are known to introduce grid artifacts in the resulting geometry. Here, we present a new approach that combines parallel streamline computation with combinatorial methods to construct a high-quality discrete Morse-Smale complex. In addition to being invariant to the orientation of the underlying grid, this algorithm allows users to selectively build a subset of features using high-quality geometry. In particular, a user may specifically select which ascending/descending manifolds are reconstructed with improved accuracy, focusing computational effort where it matters for subsequent analysis. This approach computes Morse-Smale complexes for larger data than previously feasible with significant speedups. We demonstrate and validate our approach using several examples from a variety of different scientific domains, and evaluate the performance of our method.



A Task-Based Abstraction Layer for User Productivity and Performance Portability in Post-Moore’s Era Supercomputing
S. Petruzza, A. Gyulassy, V. Pascucci,, P. T. Bremer. In 3RD INTERNATIONAL WORKSHOP ON POST-MOORE’S ERA SUPERCOMPUTING (PMES), 2018.

The proliferation of heterogeneous computing architectures in current and future supercomputing systems dramatically increases the complexity of software development and exacerbates the divergence of software stacks. Currently, task-based runtimes attempt to alleviate these impediments, however their effective use requires expertise and deep integration that does not facilitate reuse and portability. We propose to introduce a task-based abstraction layer that separates the definition of the algorithm from the runtime-specific implementation, while maintaining performance portability.



A Task-Based Abstraction Layer for User Productivity and Performance Portability in Post-Moore’s Era Supercomputing
S. Petruzza, A. Gyulassy, V. Pascucci,, P. T. Bremer. In 3RD INTERNATIONAL WORKSHOP ON POST-MOORE’S ERA SUPERCOMPUTING (PMES), 2018.

The proliferation of heterogeneous computing architectures in current and future supercomputing systems dramatically increases the complexity of software development and exacerbates the divergence of software stacks. Currently, task-based runtimes attempt to alleviate these impediments, however their effective use requires expertise and deep integration that does not facilitate reuse and portability. We propose to introduce a task-based abstraction layer that separates the definition of the algorithm from the runtime-specific implementation, while maintaining performance portability.



Performance Optimization Strategies for WRF Physics Schemes Used in Weather Modeling
T.A.J, Ouermi, R. M. Kirby,, M. Berzins. In International Journal of Networking and Computing, Vol. 8, No. 2, IJNC , pp. 301--327. 2018.
DOI: 10.15803/ijnc.8.2_301

Performance optimization in the petascale era and beyond in the exascale era has and will require modifications of legacy codes to take advantage of new architectures with large core counts and SIMD units. The Numerical Weather Prediction (NWP) physics codes considered here are optimized using thread-local structures of arrays (SOA). High-level and low-level optimization strategies are applied to the WRF Single-Moment 6-Class Microphysics Scheme (WSM6) and Global Forecast System (GFS) physics codes used in the NEPTUNE forecast code. By building on previous work optimizing WSM6 on the Intel Knights Landing (KNL), it is shown how to further optimize WMS6 and GFS physics, and GFS radiation on Intel KNL, Haswell, and potentially on future micro-architectures with many cores and SIMD vector units. The optimization techniques used herein employ thread-local structures of arrays (SOA), an OpenMP directive, OMP SIMD, and minor code transformations to enable better utilization of SIMD units, increase parallelism, improve locality, and reduce memory traffic. The optimized versions of WSM6, GFS physics, GFS radiation run 70, 27, and 23 faster (respectively) on KNL and 26, 18 and 30 faster (respectively) on Haswell than their respective original serial versions. Although this work targets WRF physics schemes, the findings are transferable to other performance optimization contexts and provide insight into the optimization of codes with complex physical models for present and near-future architectures with many core and vector units.



Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
B. Peterson, A. Humphrey, D. Sunderland, J. Sutherland, T. Saad, H. Dasari, M. Berzins. In International Journal of Parallel Programming, Dec, 2018.
ISSN: 1573-7640
DOI: 10.1007/s10766-018-0619-1

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph (DAG) of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.



Coupling the Uintah Framework and the VisIt Toolkit for Parallel In Situ Data Analysis and Visualization and Computational Steering
A. Sanderson, A. Humphrey, J. Schmidt, R. Sisneros. In High Performance Computing, June, 2018.

Data analysis and visualization are an essential part of the scientific discovery process. As HPC simulations have grown, I/O has become a bottleneck, which has required scientists to turn to in situ tools for simulation data exploration. Incorporating additional data, such as runtime performance data, into the analysis or I/O phases of a workflow is routinely avoided for fear of excaberting performance issues. The paper presents how the Uintah Framework, a suite of HPC libraries and applications for simulating complex chemical and physical reactions, was coupled with VisIt, an interactive analysis and visualization toolkit, to allow scientists to perform parallel in situ visualization of simulation and runtime performance data. An additional benefit of the coupling made it possible to create a "simulation dashboard" that allowed for in situ computational steering and visual debugging.



Demonstrating GPU Code Portability and Scalability for Radiative Heat Transfer Computations
B. Peterson, A. Humphrey, J. Holmen T. Harman, M. Berzins, D. Sunderland, H.C. Edwards. In Journal of Computational Science, Elsevier BV, June, 2018.
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2018.06.005

High performance computing frameworks utilizing CPUs, Nvidia GPUs, and/or Intel Xeon Phis necessitate portable and scalable solutions for application developers. Nvidia GPUs in particular present numerous portability challenges with a different programming model, additional memory hierarchies, and partitioned execution units among streaming multiprocessors. This work presents modifications to the Uintah asynchronous many-task runtime and the Kokkos portability library to enable one single codebase for complex multiphysics applications to run across different architectures. Scalability and performance results are shown on multiple architectures for a globally coupled radiation heat transfer simulation, ranging from a single node to 16384 Titan compute nodes.



Numerical integration in multiple dimensions with designed quadrature
V. Keshavarzzadeh, R.M. Kirby, A. Narayan. In CoRR, 2018.

We present a systematic computational framework for generating positive quadrature rules in multiple dimensions on general geometries. A direct moment-matching formulation that enforces exact integration on polynomial subspaces yields nonlinear conditions and geometric constraints on nodes and weights. We use penalty methods to address the geometric constraints, and subsequently solve a quadratic minimization problem via the Gauss-Newton method. Our analysis provides guidance on requisite sizes of quadrature rules for a given polynomial subspace, and furnishes useful user-end stability bounds on error in the quadrature rule in the case when the polynomial moment conditions are violated by a small amount due to, e.g., finite precision limitations or stagnation of the optimization procedure. We present several numerical examples investigating optimal low-degree quadrature rules, Lebesgue constants, and 100-dimensional quadrature. Our capstone examples compare our quadrature approach to popular alternatives, such as sparse grids and quasi-Monte Carlo methods, for problems in linear elasticity and topology optimization.



On the treatment of field quantities and elemental continuity in fem solutions
A. Jallepalli, J. Docampo-Sánchez, J.K. Ryan, R. Haimes, R.M. Kirby. In IEEE Transactions on Visualization and Computer Graphics, Vol. 24, No. 1, IEEE, pp. 903--912. Jan, 2018.
DOI: 10.1109/tvcg.2017.2744058

As the finite element method (FEM) and the finite volume method (FVM), both traditional and high-order variants, continue their proliferation into various applied engineering disciplines, it is important that the visualization techniques and corresponding data analysis tools that act on the results produced by these methods faithfully represent the underlying data. To state this in another way: the interpretation of data generated by simulation needs to be consistent with the numerical schemes that underpin the specific solver technology. As the verifiable visualization literature has demonstrated: visual artifacts produced by the introduction of either explicit or implicit data transformations, such as data resampling, can sometimes distort or even obfuscate key scientific features in the data. In this paper, we focus on the handling of elemental continuity, which is often only C0 continuous or piecewise discontinuous, when visualizing primary or derived fields from FEM or FVM simulations. We demonstrate that traditional data handling and visualization of these fields introduce visual errors. In addition, we show how the use of the recently proposed line-SIAC filter provides a way of handling elemental continuity issues in an accuracy-conserving manner with the added benefit of casting the data in a smooth context even if the representation is element discontinuous.



Weighted approximate fekete points: sampling for least-squares polynomial approximation
L. Guo, A. Narayan, L. Yan, T. Zhou. In SIAM Journal on Scientific Computing, Vol. 40, No. 1, SIAM, pp. A366--A387. Jan, 2018.
DOI: 10.1137/17m1140960

We propose and analyze a weighted greedy scheme for computing deterministic sample configurations in multidimensional space for performing least-squares polynomial approximations on $L^2$ spaces weighted by a probability density function. Our procedure is a particular weighted version of the approximate Fekete points method, with the weight function chosen as the (inverse) Christoffel function. Our procedure has theoretical advantages: when linear systems with optimal condition number exist, the procedure finds them. In the one-dimensional setting with any density function, our greedy procedure almost always generates optimally conditioned linear systems. Our method also has practical advantages: our procedure is impartial to the compactness of the domain of approximation and uses only pivoted linear algebraic routines. We show through numerous examples that our sampling design outperforms competing randomized and deterministic designs when the domain is both low and high dimensional.



Spectral Element and hp Methods,
Y. Yu, R.M. Kirby, G.E. Karniadakis. In Encyclopedia of Computational Mechanics Second Edition, John Wiley & Sons, Ltd, pp. 1--43. 2018.

Spectral/hp element methods provide high‐order discretization, which is essential in the longtime integration of advection–diffusion systems and for capturing dynamic instabilities in solids. In this chapter, we review the main formulations for simulations of incompressible and compressible viscous flows as well as for solid mechanics and present several examples with some emphasis on fluid–structure interactions and interfaces. The first generation of (nodal) spectral elements was limited to relatively simple geometries and smooth solutions. However, the new generation of spectral/hp elements, consisting of both nodal and modal forms, can handle very complex geometries using unstructured grids and can capture strong shocks by employing discontinuous Galerkin methods. New flexible formulations allow simulations of multiphysics problems including extremely complex geometries and multiphase flows. Several implementation strategies have also been developed on the basis of multilevel parallel algorithms that allow dynamic p ‐refinement at constant wall clock time. After three decades of intense developments, spectral element and hp methods are mature and efficient to be used effectively in applications of industrial complexity. They provide the capabilities that standard finite element and finite volume methods do, but, in addition, they exhibit high‐order accuracy and error control.



Curvilinear Mesh Adaptation Using Radial Basis Function Interpolation and Smoothing
V. Zala, V. Shankar, S.P. Sastry, R.M. Kirby. In Journal of Scientific Computing, Springer Nature, pp. 1--22. April, 2018.
DOI: 10.1007/s10915-018-0711-0

We present a new iterative technique based on radial basis function (RBF) interpolation and smoothing for the generation and smoothing of curvilinear meshes from straight-sided or other curvilinear meshes. Our technique approximates the coordinate deformation maps in both the interior and boundary of the curvilinear output mesh by using only scattered nodes on the boundary of the input mesh as data sites in an interpolation problem. Our technique produces high-quality meshes in the deformed domain even when the deformation maps are singular due to a new iterative algorithm based on modification of the RBF shape parameter. Due to the use of RBF interpolation, our technique is applicable to both 2D and 3D curvilinear mesh generation without significant modification.



Flexible Live‐Wire: Image Segmentation with Floating Anchors
B. Summa, N. Faraj, C. Licorish, V. Pascucci. In Computer Graphics Forum, Vol. 37, No. 2, Wiley, pp. 321-328. May, 2018.
DOI: 10.1111/cgf.13364

We introduce Flexible Live‐Wire, a generalization of the Live‐Wire interactive segmentation technique with floating anchors. In our approach, the user input for Live‐Wire is no longer limited to the setting of pixel‐level anchor nodes, but can use more general anchor sets. These sets can be of any dimension, size, or connectedness. The generality of the approach allows the design of a number of user interactions while providing the same functionality as the traditional Live‐Wire. In particular, we experiment with this new flexibility by designing four novel Live‐Wire interactions based on specific primitives: paint, pinch, probable, and pick anchors. These interactions are only a subset of the possibilities enabled by our generalization. Moreover, we discuss the computational aspects of this approach and provide practical solutions to alleviate any additional overhead. Finally, we illustrate our approach and new interactions through several example segmentations.



Uncertainty quantification guided robust design for nanoparticles' morphology
Y. He, M. Razi, C. Forestiere, L. Dal Negro, R.M. Kirby. In Computer Methods in Applied Mechanics and Engineering, Elsevier BV, pp. 578--593. July, 2018.
DOI: 10.1016/j.cma.2018.03.027

The automatic inverse design of three-dimensional plasmonic nanoparticles enables scientists and engineers to explore a wide design space and to maximize a device's performance. However, due to the large uncertainty in the nanofabrication process, we may not be able to obtain a deterministic value of the objective, and the objective may vary dramatically with respect to a small variation in uncertain parameters. Therefore, we take into account the uncertainty in simulations and adopt a classical robust design model for a robust design. In addition, we propose an efficient numerical procedure for the robust design to reduce the computational cost of the process caused by the consideration of the uncertainty. Specifically, we use a global sensitivity analysis method to identify the important random variables and consider the non-important ones as deterministic, and consequently reduce the dimension of the stochastic space. In addition, we apply the generalized polynomial chaos expansion method for constructing computationally cheaper surrogate models to approximate and replace the full simulations. This efficient robust design procedure is performed by varying the particles' material among the most commonly used plasmonic materials such as gold, silver, and aluminum, to obtain different robust optimal shapes for the best enhancement of electric fields.



A gradient enhanced ℓ1-minimization for sparse approximation of polynomial chaos expansions
L. Guo, A. Narayan, T. Zhou. In Journal of Computational Physics, Vol. 367, Elsevier BV, pp. 49--64. Aug, 2018.

We investigate a gradient enhanced ℓ 1-minimization for constructing sparse polynomial chaos expansions. In addition to function evaluations, measurements of the function gradient is also included to accelerate the identification of expansion coefficients. By designing appropriate preconditioners to the measurement matrix, we show gradient enhanced ℓ 1 minimization leads to stable and accurate coefficient recovery. The framework for designing preconditioners is quite general and it applies to recover of functions whose domain is bounded or unbounded. Comparisons between the gradient enhanced approach and the standard ℓ 1-minimization are also presented and numerical examples suggest that the inclusion of derivative information can guarantee sparse recovery at a reduced computational cost.



Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction
J. Hampton, HR. Fairbanks, A. Narayan, A. Doostan. In Journal of Computational Physics, Vol. 368, Elsevier BV, pp. 315--332. September, 2018.
DOI: 10.1016/j.jcp.2018.04.015

For practical model-based demands, such as design space exploration and uncertainty quantification (UQ), a high-fidelity model that produces accurate outputs often has high computational cost, while a low-fidelity model with less accurate outputs has low computational cost. It is often possible to construct a bi-fidelity model having accuracy comparable with the high-fidelity model and computational cost comparable with the low-fidelity model. This work presents the construction and analysis of a non-intrusive (i.e., sample-based) bi-fidelity model that relies on the low-rank structure of the map between model parameters/uncertain inputs and the solution of interest, if exists. Specifically, we derive a novel, pragmatic estimate for the error committed by this bi-fidelity model. We show that this error bound can be used to determine if a given pair of low- and high-fidelity models will lead to an accurate bi-fidelity approximation. The cost of this error bound is relatively small and depends on the solution rank. The value of this error estimate is demonstrated using two example problems in the context of UQ, involving linear and non-linear partial differential equations.



Fast predictive models based on multi-fidelity sampling of properties in molecular dynamics simulations
M. Razi, A. Narayan, RM. Kirby, D. Bedrov. In Computational Materials Science, Vol. 152, Elsevier BV, pp. 125--133. September, 2018.
DOI: 10.1016/j.commatsci.2018.05.029

In this paper we introduce a novel approach for enhancing the sampling convergence for properties predicted by molecular dynamics. The proposed approach is based upon the construction of a multi-fidelity surrogate model using computational models with different levels of accuracy. While low fidelity models produce result with a lower level of accuracy and computational cost, in this framework they can provide the basis for identification of the optimal sparse sampling pattern for high fidelity models to construct an accurate surrogate model. Such an approach can provide a significant computational saving for the estimation of the quantities of interest for the underlying physical/engineering systems. In the present work, this methodology is demonstrated for molecular dynamics simulations of a Lennard-Jones fluid. Levels of multi-fidelity are defined based upon the integration time step employed in the simulation. The proposed approach is applied to two different canonical problems including (i) single component fluid and (ii) binary glass-forming mixture. The results show about 70% computational saving for the estimation of averaged properties of the systems such as total energy, self diffusion coefficient, radial distribution function and mean squared displacements with a reasonable accuracy.



A Preliminary Port and Evaluation of the Uintah AMT Runtime on Sunway TaihuLight
Z. Yang, D. Sahasrabudhe, A. Humphrey, M. Berzins. In 9th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018), IEEE, May, 2018.

The Sunway TaihuLight is the world's fastest supercomputer at the present time with a low power consumption per flop and a unique set of architectural features. Applications performance depends heavily on being able to adapt codes to make best use of these features. Porting large codes to novel architectures such as Sunway is both time-consuming and expensive, as modifications throughout the code may be needed. One alternative to conventional porting is to consider an approach based upon Asynchronous Many Task (AMT) Runtimes such as the Uintah framework considered here. Uintah structures the problem as a series of tasks that are executed by the runtime via a task scheduler. The central challenge in porting a large AMT runtime like Uintah is thus to consider how to devise an appropriate scheduler and how to write tasks to take advantage of a particular architecture. It will be shown how an asynchronous Sunway-specific scheduler, based upon MPI and athreads, may be written and how individual taskcode for a typical but model structured-grid fluid-flow problem needs to be refactored. Preliminary experiments show that it is possible to obtain a strong-scaling efficiency ranging from 31.7% to 96.1% for different problem sizes with full optimizations. The asynchronous scheduler developed here improves the overall performance over a synchronous alternative by at most 22.8%, and the fluid-flow simulation reaches 1.17% the theoretical peak of the running nodes. Conclusions are drawn for the porting of full-scale Uintah applications.



Scalable Data Management of the Uintah Simulation Framework for Next-Generation Engineering Problems with Radiation
S. Kumar, A. Humphrey, W. Usher, S. Petruzza, B. Peterson, J. A. Schmidt, D. Harris, B. Isaac, J. Thornock, T. Harman, V. Pascucci,, M. Berzins. In Supercomputing Frontiers, Springer International Publishing, pp. 219--240. 2018.
ISBN: 978-3-319-69953-0
DOI: 10.1007/978-3-319-69953-0_13

The need to scale next-generation industrial engineering problems to the largest computational platforms presents unique challenges. This paper focuses on data management related problems faced by the Uintah simulation framework at a production scale of 260K processes. Uintah provides a highly scalable asynchronous many-task runtime system, which in this work is used for the modeling of a 1000 megawatt electric (MWe) ultra-supercritical (USC) coal boiler. At 260K processes, we faced both parallel I/O and visualization related challenges, e.g., the default file-per-process I/O approach of Uintah did not scale on Mira. In this paper we present a simple to implement, restructuring based parallel I/O technique. We impose a restructuring step that alters the distribution of data among processes. The goal is to distribute the dataset such that each process holds a larger chunk of data, which is then written to a file independently. This approach finds a middle ground between two of the most common parallel I/O schemes--file per process I/O and shared file I/O--in terms of both the total number of generated files, and the extent of communication involved during the data aggregation phase. To address scalability issues when visualizing the simulation data, we developed a lightweight renderer using OSPRay, which allows scientists to visualize the data interactively at high quality and make production movies. Finally, this work presents a highly efficient and scalable radiation model based on the sweeping method, which significantly outperforms previous approaches in Uintah, like discrete ordinates. The integrated approach allowed the USC boiler problem to run on 260K CPU cores on Mira.



Nonlinear stability and time step selection for the MPM method
M. Berzins. In Computational Particle Mechanics, Jan, 2018.
ISSN: 2196-4386
DOI: 10.1007/s40571-018-0182-y

The Material Point Method (MPM) has been developed from the Particle in Cell (PIC) method over the last 25 years and has proved its worth in solving many challenging problems involving large deformations. Nevertheless there are many open questions regarding the theoretical properties of MPM. For example in while Fourier methods, as applied to PIC may provide useful insight, the non-linear nature of MPM makes it necessary to use a full non-linear stability analysis to determine a stable time step for MPM. In order to begin to address this the stability analysis of Spigler and Vianello is adapted to MPM and used to derive a stable time step bound for a model problem. This bound is contrasted against traditional Speed of sound and CFL bounds and shown to be a realistic stability bound for a model problem.