Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.

BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).

Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.

Martin Berzins

Parallel Computing
GPUs

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs

Valerio Pascucci

Scientific Data Management

Chris Johnson

Problem Solving Environments

Ross Whitaker

GPUs

Chuck Hansen

GPUs

Amir Arzani

Scientific machine learning
Data-driven fluid flow modeling

Funded Research Projects:

Optimal Approximation Algorithms in High Dimensions

Akil Narayan
The increasing power of modern computational hardware has enabled computer-based simulation of sophisticated mathematical models that resolve important physical phenomena in great detail. With the advent of these computational abilities has come an increased demand to include more complex physical interactions in the models, and thus an increased strain on computational resources. Modern engineering design utilizes such models, and these design problems typically involve (1) numerous tunable parameters that affect reliability, cost, and failure, (2) uncertainty about external influences manifesting as randomness in the model, and (3) epistemic ignorance involving model form uncertainty. In realistic applications, the collection of these effects leads to predictions that depend on a cumulatively high-dimensional parameter. This project focuses on development and deployment of novel, near-optimal experimental design and sampling algorithms for the accurate and efficient simulation of physical models parameterized by high-dimensional inputs. The work of this project involves the application of recently developed approximation theory results in the computational arena, targeted advances that extend theoretical mathematics for computational purposes, and the development and implementation of algorithms for large-scale computations.

The technical aspects of this project are designed to provide feasible computational algorithms and concrete mathematical guarantees for tasks in high-dimensional approximation. The three major core components for the completion of this task involve the design, implementation, and analysis of algorithms that leverage optimality characteristics of (1) random and deterministic experimental and sampling design, (2) computational algorithms for identifying efficient sampling schemes, and (3) strategies and techniques for emerging approximation paradigms such as sparse approximation and dimension reduction. A crosscutting theme is application of these methods to problems of modern interest in scientific computing. This project involves fundamental contributions to the fields of applied approximation theory and computational approximation methods through the development of applications-oriented sampling designs with provable near-optimality. Theoretical investigations of this project connect classical techniques in approximation and linear algebra with emerging algorithms in data reduction and reduced order modeling. The implementation of these algorithms will significantly enhance theoretical understanding and computational feasibility for goal-oriented design, parameter study and reduction, sparse and compressive representations, model verification and calibration, and data-driven simulations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Cyberinfrastructure Center of Excellence Pilot Study

Ewa Deelman, Valerio Pascucci, Anirban Mandal, Jaroslaw Nabrzyski, Robert Ricci
University of Southern California, Los Angeles, CA, United States

NSF's major multi-user research facilities (large facilities) are sophisticated research instruments and platforms - such as large telescopes, interferometers and distributed sensor arrays - that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) - computing, data and software systems, networking, and associated human capital - to enable broad delivery and analysis of facility-generated data. As a result of these cyber infrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades. The goal of this pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CI CoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI.

The pilot project will explore how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. The pilot project will also catalyze a key function of an eventual CI CoE - to provide a forum for exchange of experience and knowledge among CI experts. The project will also gather best practices for large facilities, with the aim of enhancing individual facility CI efforts in the broader CI context. The discussion forum and planning effort for a future CI CoE will also address training and workforce development by expanding the pool of skilled facility CI experts and forging career paths for CI professionals. The result of this work will be a strategic plan for a CI CoE that will be evaluated and refined through community interactions: workshops and direct engagement with the facilities and the broader CI community.

This project is being supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering and the Division of Emerging Frontiers in the Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Efficiency and Productivity through Artificial Intelligence

Valerio Pascucci
Efficient cyberinfrastructure (advanced computing, data, software and networking infrastructure) is a critical component of the support that NSF provides for new discoveries in science and engineering. Cyberinfrastructure is complex and traditionally requires years of human hand-tuning to fully achieve maximal performance for scientific users. We propose to introduce Artificial Intelligence (AI) as a way to automatically and quickly optimize the performance and broadest use of recent NSF-supported advanced computing resources. Through this pilot effort our ultimate aim is to enable and accelerate scientific advances in widely diverse fields such as biology, chemistry, oceanography, materials science, climate modeling, and cosmology.

As the research cyberinfrastructure grows rapidly in scale and complexity, it is essential to integrate new technologies based on Machine Learning (ML) and AI to ensure that the investments in new hardware and software components result in proportional improvements in performance and capability. This project will undertake a transformative research activity targeting: (1) scaling ML algorithms to make them easily available to the scientific community; and (2) improving cyberinfrastructure efficiency through AI-based predictive models. This technical work will be complemented and informed by a community engagement effort to jointly catalog the state of the art and identify future challenges and opportunities in enabling a new smart cyberinfrastructure.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Robust and Scalable Multi-Fidelity Algorithms for Model-Based Predictions

Akil Narayan
Modern computational models are complex in nature: accurate predictions of physics require detailed and intensive computational resources. As such, development of accurate scientific models has been the area of research emphasis in recent decades. Today’s scientific models involve largescale simulation tools, often with many interdependent components, and sometimes requiring days to complete a single simulation. Adding to this complexity is the presence of uncertainty, which is often encoded into models via parameters or random variables. Any direct approach to analyze the impact of parametric variation on such expensive models is infeasible.
One approach to circumvent this limitation is to utilize hierarchies of models, each with differing computational costs and predictive fidelities. Research in the past few years has demonstrated that intelligent allocation of resources across this ensemble of models can produce predictions with much greater accuracy than concentrating all resources in a single model. Such multi-fidelity procedures hold the potential to optimally utilize ensembles of models to make predictions.

The main components of this proposed project address optimal resource allocation and robust and scalable model reduction, generation, and learning via low-rank multi-fidelity and multilevel procedures. The overall goal is the construction of surrogate models with accuracy guarantees that can be used in design optimization, inference, and general uncertainty quantification scenarios. The tasks associated with this project involve fundamental mathematical and algorithmic advances in low-rank multi-fidelity methods. Error certificates to ensure accuracy will be developed when possible. Kernel learning techniques will be employed to explore problem-dependent low-rank structure and optimize allocation of resources. Algorithmic methods to handle heterogeneous models, data, and parameter spaces will be developed resulting in a comprehensive framework for utilizing low-rank multi-fidelity methods.

The multi-fidelity procedures devised in this project will also aid in developing novel strategies for model comparison, ranking, discrimination, and genesis. Model comparison and ranking will enable development of a comprehensive multi-fidelity pipeline to automatically learn and update model hierarchies and fidelities. Model generation using the simulation data from a multi-fidelity pipeline allows the automated construction of model emulators that can more easily be explored to detect and exploit low-rank structure.

This project will explore usage of low-rank multi-fidelity methods in two main application areas. The first area is in robust design under uncertainty, which requires robust, accurate, and efficient forward model evaluations. The second area of application is in statistical inference, requiring computationally expensive exploration of posterior distributions. This project will demonstrate the utility of low-rank multi-fidelity methods in acceleration of robust design and inferential tasks. Problems addressed by the work in this project include simulations in topology optimization, nonlocal/fractional differential equation models, modeling of multi-physics solar power receivers, and supersonic channel flow.

UINTAH + HEDGEHOG -- Hybrid Task Graph Execution Library Development for Generalized Work Loads

Martin Berzins
The Overall Objective is to develop a new Uintah runtime environment that demonstrates a flexible approach for accommodating different task execution and state management strategies consistent with a starting point:

1. Uintah uses an asynchronous manytask (AMT) approach that has been shown to strong and weak scale to 256K cores with 16K GPUs on Titan and 768K cores on Mira, through its asynchronous adaptive and over-decomposition based runtime scheduler. This scheduler works on many different and diverse architectures, from many DOE and NSF leadership class machines to Chinas Sunway Tiahulight. In addition this AMT approach when combined with mesh coarsening allows for an efficient approach to resilience.

2. HTGS/Hedgehog is a high performance single node multi-CPU/GPU tasked based system developed at NIST. Internal state management and execution strategies at the level of a single node is maintained within an explicit task graph representation. HTGS/Hedgehog has produced good competitive results on a single node.

3. Demonstrating that the integration of two different task execution paradigms and the sharing of both local and global state can occur with minimal changes to either libraries.

The objective is to integrate the HTGS/Hedgehog Task Graph library into the Uintah Runtime. This new runtime would combine the global state management and multi-nodal execution characteristics of Uintah with the local single node execution facilities of HTGS/Hedgehog. This work would demonstrate and show how state management would be managed with these two different libraries. While the two libraries share many commonalities and architectural similarities, they are distinct in the underlying implementation. Understanding and developing a robust mechanism for sharing global and local state between the two libraries along with integrating the overall resource management strategies and task execution for multiple CPU/GPU architectures is the focus of this work.

The objectives will be carried out by first conducting feasibility studies between two different applications (3D structured grid application and an imaging analysis application) followed by the prototype implementation of new Uintah Scheduler that integrates the HTGS/Hedgehog library at the nodal level. The two different applications will be used demonstrate scalability and performance on both single node and multi-node systems. Finally, the proof-of-concept prototype Uintah Scheduler implementation will be transformed into a production level system in the third year of this effort.

Portable Applications Driven Approach to Scalability on Frontera and Future Exascale Systems

Martin Berzins
The present uncertainty in computer architectures requires software design to allow applications codes to both be able to scale across 20K to 100K nodes and to be able to run portably on a range of possible nodal architectures with a variety of processor technologies being involved, ranging from i86, ARM, GPU to possibly FPGAs. At the same time it isi important to use challenging applications to validate the software solutions and to ensure that they are realistic. This project led by Professor Martin Berzins will use the Frontera system to help address and demonstrate portability for an important class of engineering applications using the Uintah software.

Uintah software employs an asynchronous many task-based approach that has proved to be exceptionally robust at enabling complex engineering applications to run at scale on a broad range of architectures. As new and different architectures require not only the ability to execute tasks asynchronously but to deal with memory hierarchies and to execute efficiently on i86 architectures to GPUs and to a broad range of other possible architectures. Uintah use an approach based upon the Kokkos portability library that makes it possible to build a simple clean loop level interface that enables the loops themselves to execute efficiently on different architectures.

The work program will first port and evaluate existing Uintah architectures to Frontera and then consider new applications that apply the Uintah methodology to areas such as unstructured mesh calculations and particle methods applied to biomedical problems. The work program described here covers the application of these ideas to Frontera. The main effort will be through other funded projects, but any funding variability will be accomodated through an adaptive appropach to the applications space.

Collaborative Research: Detecting and Preventing Covid-19 with Privacy-Preserving Decentralized Machine Learning

Bao Wang
We are facing scientific challenges caused by the COVID-19, including detecting COVID-19 accurately and preventing its spread efficiently. Cutting-edge machine learning technologies, especially modern deep learning arts, provide feasible avenues to resolve these challenges. Deep learning-based computational imaging algorithms facilitate accurate and rapid COVID-19 diagnosis; sequential modeling with recurrent neural networks or transformers enables accurate and real-time COVID-19 spread prediction. However, most existing black-box deep learning research on the COVID-19 is the alchemy of turning unstructured data into gold and based on systematic trial and error. The current deep learning-based COVID-19 research raises many untrustworthy issues, including unreliable diagnosis, data privacy sacrifice, and lack of interpretability. Lacking interpretable and reliable predictions puts substantial strains on practitioners to leverage deep learning approaches to detect and prevent COVID-19. Data privacy constraints bring us many unraveling challenges. Thus, developing trustworthy machine learning algorithms while preserving data privacy is crucial to detect and prevent COVID-19.

We are a team of researchers with different expertise and common research interests, who jointly seek to develop theoretically principled decentralized machine learning algorithms that can provide reliable predictions. Furthermore, we focus on applying these machine learning algorithms to accurately and rapidly diagnose COVID-19 patients and predict the virus spread. We propose a challenging but walkable path towards developing privacy-preserving machine learning algorithms to detect and prevent COVID-19. We will integrate our expertise synergistically to develop privacy-preserving decentralized machine learning algorithms with performance guarantees and a high-throughput and low-latency software package to accurately and rapidly detect COVID-19 and effectively prevent its spread. As such, we propose three interconnected thrusts to develop novel neural network architectures based on mathematical principles, efficient privacy-preserving decentralized optimization algorithms, algorithms for spatiotemporal data forecasting and medical image processing and analysis, and an integrated software package to assist fighting against the coronavirus. Each thrust contains multiple theoretical explorations and numerical validation.

Intellectual Merit:
The proposal's intellectual merit include: (i) development of robust and mathematically principled recurrent neural networks for accurate real-time spatio-temporal forecasting, (ii) development of novel efficient federated and decentralized machine learning algorithms with a performance guarantee, (iii) leveraging the stochastic differential equations theory to develop new privacy-preserving machine learning mechanisms, (iv) construction of new epidemiology models-principled recurrent neural networks with accurate and interpretable predictions, (v) development of trustworthy deep learning-based frameworks for COVID-19 diagnosis from multi-modal medical measurements.

Broader Impacts:
The broader impacts of this project are in applying the proposed algorithms and their analysis over a wide range of science and engineering disciplines, such as scientific and medical image analysis, epidemic forecasting, patient monitoring, and microscopic imaging. The projects shall train a diverse body of the graduate and undergraduate students at Michigan State University, the University of Kentucky, and the University of Utah through collaborative education and research activities in applied mathematics, statistics, computer science, data science, physics, and social science. The project also plans to have research activities involving under-represented students in three universities located in three states. Besides the interdisciplinary collaboration across other institutions, we also aim to establish industrial partnerships to extend the proposed project's impact. The developed software will be shared with the general public through Github.

Sub-Pilot-Scale Production of High-Value Products for U.S. Coals

Chris Johnson
The primary objectives of this project are to: 1) provide sub-pilot scale verification of lab-scale developments on the production of isotropic and mesophase coal-tar pitch (CTP) for carbon fiber production, using coals from five U.S. coal-producing regions (UT, WY, WV, AK, IL); 2) investigate the production of a high-value b-SiC byproduct using residual coal char from the tar production process, and 3) develop an extensive database and suite of tools for data analysis and economic modeling, to relate process conditions to product quality, to assess the economic viability of coals from different regions for producing specific high-value products.

The University of Utah will use a 0.5 ton/day rotary reactor to pyrolyze coals to produce tars suitable for upgrading to coal tar pitch. The same reactor technology will be used in a second stage to perform the tar upgrading to either mesophase or isotropic pitch, depending on the properties of the original coal. The University of Wyoming will spin the product pitch into carbon fiber, to assess fiber quality arising from different coals and from different processing conditions. The solid char byproduct from coal pyrolysis will be used by the University of Wyoming to produce b-SiC. The University of Utah will work with Marshall University to develop a novel database, coupled with detailed economic models and analysis tools, to provide a means for understanding correlations between coal properties, process conditions and product quality, to allow assessment of the potential economic viability of coals from different regions for producing specific high-value products. Access to these some of these computational tools will become available to the public through a web-based community portal.

This effort is a major step towards providing a low-cost carbon fiber product from coal for potential use in automotive and other important markets, and will also lead to new economic development opportunities for communities with coal-based economies.

Experimental Characterization and Modeling of Failure in Post-Buckled Composite Stiffened Panels with a Scarf Repair

Alliance for Multiscale Modeling of Electronic Materials for an Energy Efficient Army

Mike Kirby
The objective of this Alliance is to conduct fundamental research to create MSME to support development of future electronic materials and devices for the Army. The U.S. Army Research Laboratory (ARL) envisions the MultiScale multidisciplinary Modeling of Electronic materials (MSME) Collaborative Research Alliance (CRA) which will bring together government, industrial, and academic institutions to undertake the fundamental research necessary to enable the quantitative understanding of electronic materials from the smallest to the largest relevant scales.

Augmented Design Through Analysis and Visualization Facilitating Better Designs and Enhanced Designers

In Situ Feature Extraction and Visualization from Discontinuous Galerkin Based High-Order Methods

Mike Kirby
The use of simulation science as a means of scientific inquiry is increasing at a tremendous rate. The process of mathematically modeling physical phenomena, estimating key modeling parameters, numerically approximating the solution, and computationally solving the resulting algorithm has inundated the scientific and engineering worlds, allowing for rapid advances in our understanding and utilization of the world around us. The efficacy of simulation science has been, in part, due to two critical components: (1) the identification and minimization of the error budget (e.g. modeling, discretization and uncertainty errors), and equally importantly, (2) evaluation mechanisms (such as visualization) by which the investigator assimilates the data produced through simulation. The latter allows for further refinement of the simulation science process (through model correction, increased numerical resolution, or algorithm debugging, etc.) and makes possible scientific statements about the physical phenomena being investigated.

Tremendous effort has been exerted over many decades in the pursuit of numerical methods that are both flexible and accurate, hence providing sufficient fidelity to be employed in the numerical solution of a large number of models, and sufficient analysis of accuracy to allow researchers to focus their attention on model refinement and uncertainty quantification. High-order finite element methods (also known as spectral/hp element methods), using either the continuous Galerkin or discontinuous Galerkin formulation, have reached a level of sophistication that allows them to be commonly applied to a diverse set of real-life engineering problems in computational solid mechanics, fluid dynamics, acoustics and electromagnetics. Many of the physical problems of interest are, unfortunately, not steady-state --- leading to simulations that must run for a long time (days, weeks and in some cases months). Thus, in the absence of creative solutions, datasets can easily consume all available storage and networking resources. Examples of such simulations within fluid dynamics include all simulations in which the fluid is in transition or fully turbulent. With regards to ARO interests, problems in turbo-machinery and rotorcraft, where aspects of the geometry are rotating and/or sliding past one other, fall into this category. High-order finite element methods are now beginning to be used to simulate these physical systems due to their inherent ability to capture complex structures (such as vortices) with little numerical dissipation and dispersion. The transient nature of these simulations complicates the data handling (post processing requires the time history) and renders single snap-shots of the solution insufficient to understand the time-varying nature of the physics.

Objective
Our research objectives are two-fold: (1) We will generate "high-order FEM" appropriate dimensionality reduction feature extraction methods such as vortex cores which can be accomplished as part of an in situ data processing pipeline. (2) Given the exploratory nature inherent in analyzing and visualizing transient phenomena, we may specify regions of interest in an in situ fashion within a simulation field based upon the visualization objective, extract and transmit the result of working on relevant high-order FEM information to our visualization system, and then reconstruct the visualization features of interest with the cognizance of V&V.

Publications in Scientific Computing:

Page 7 of 27

Start
Prev
2
3
4
5
6
7
8
9
10
11
Next
End

A bandit-learning approach to multifidelity approximation
Subtitled “arXiv preprint arXiv:2103.15342,” Y. Xu, V. Keshavarzzadeh, R. M. Kirby, A. Narayan. 2021.

Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor\textit a priori knowledge of statistical information (eg, correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.

L1-based reduced over collocation and hyper reduction for steady state and time-dependent nonlinear equations
Y. Chen, L. Ji, A. Narayan, Z. Xu. In Journal of Scientific Computing, Vol. 87, No. 1, Springer US, pp. 1--21. 2021.

The task of repeatedly solving parametrized partial differential equations (pPDEs) in optimization, control, or interactive applications makes it imperative to design highly efficient and equally accurate surrogate models. The reduced basis method (RBM) presents itself as such an option. Accompanied by a mathematically rigorous error estimator, RBM carefully constructs a low-dimensional subspace of the parameter-induced high fidelity solution manifold on which an approximate solution is computed. It can improve efficiency by several orders of magnitudes leveraging an offline-online decomposition procedure. However this decomposition, usually implemented with aid from the empirical interpolation method (EIM) for nonlinear and/or parametric-nonaffine PDEs, can be challenging to implement, or results in severely degraded online efficiency. In this paper, we augment and extend the EIM approach as a direct solver, as opposed to an assistant, for solving nonlinear pPDEs on the reduced level. The resulting method, called Reduced Over-Collocation method (ROC), is stable and capable of avoiding efficiency degradation exhibited in traditional applications of EIM. Two critical ingredients of the scheme are collocation at about twice as many locations as the dimension of the reduced approximation space, and an efficient L1-norm-based error indicator for the strategic selection of the parameter values whose snapshots span the reduced approximation space. Together, these two ingredients ensure that the proposed L1-ROC scheme is both offline- and online-efficient. A distinctive feature is that the efficiency degradation appearing in alternative RBM approaches that utilize EIM for nonlinear and nonaffine problems is circumvented, both in the offline and online stages. Numerical tests on different families of time-dependent and steady-state nonlinear problems demonstrate the high efficiency and accuracy of L1-ROC and its superior stability performance.

Optimal design for kernel interpolation: Applications to uncertainty quantification
A. Narayan, L. Yan, T. Zhou. In Journal of Computational Physics, Vol. 430, Academic Press, pp. 110094. 2021.

The paper is concerned with classic kernel interpolation methods, in addition to approximation methods that are augmented by gradient measurements. To apply kernel interpolation using radial basis functions (RBFs) in a stable way, we propose a type of quasi-optimal interpolation points, searching from a large set of candidate points, using a procedure similar to designing Fekete points or power function maximizing points that use pivot from a Cholesky decomposition. The proposed quasi-optimal points results in smaller condition number, and thus mitigates the instability of the interpolation procedure when the number of points becomes large. Applications to parametric uncertainty quantification are presented, and it is shown that the proposed interpolation method can outperform sparse grid methods in many interesting cases. We also demonstrate the new procedure can be applied to constructing gradient-enhanced Gaussian process emulators.

Hyperbolicity-Preserving and Well-Balanced Stochastic Galerkin Method for Two-Dimensional Shallow Water Equations
D. Dai, Y. Epshteyn, A. Narayan. In SIAM Journal on Scientific Computing, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A929-A952. 2021.

Stochastic Galerkin formulations of the two-dimensional shallow water systems parameterized with random variables may lose hyperbolicity, and hence change the nature of the original model. In this work, we present a hyperbolicity-preserving stochastic Galerkin formulation by carefully selecting the polynomial chaos approximations to the nonlinear terms of , and in the shallow water equations. We derive a sufficient condition to preserve the hyperbolicity of the stochastic Galerkin system which requires only a finite collection of positivity conditions on the stochastic water height at selected quadrature points in parameter space. Based on our theoretical results for the stochastic Galerkin formulation, we develop a corresponding well-balanced hyperbolicity-preserving central-upwind scheme. We demonstrate the accuracy and the robustness of the new scheme on several challenging numerical tests.

A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication
M. Rasouli, R. M. Kirby, H. Sundar. In The International Conference on High Performance Computing in Asia-Pacific Region, pp. 110-119. 2021.

Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.

Optimal allocation of computational resources based on Gaussian process: Application to molecular dynamics simulations
J. Chilleri, Y. He, D. Bedrov, R. M. Kirby. In Computational Materials Science, Vol. 188, Elsevier, pp. 110178. 2021.

Simulation models have been utilized in a wide range of real-world applications for behavior predictions of complex physical systems or material designs of large structures. While extensive simulation is mathematically preferable, external limitations such as available resources are often necessary considerations. With a fixed computational resource (i.e., total simulation time), we propose a Gaussian process-based numerical optimization framework for optimal time allocation over simulations at different locations, so that a surrogate model with uncertainty estimation can be constructed to approximate the full simulation. The proposed framework is demonstrated first via two synthetic problems, and later using a real test case of a glass-forming system with divergent dynamic relaxations where a Gaussian process is constructed to estimate the diffusivity and its uncertainty with respect to the temperature.

Deep coregionalization for the emulation of simulation-based spatial-temporal fields
W. W. Xing, R. M. Kirby, S. Zhe. In Journal of Computational Physics, Academic Press, pp. 109984. 2021.

Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulations, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-driven surrogates for high-dimensional outputs, most methods require a significant number of samples to cover the response surface in order to achieve a reasonable degree of accuracy. This demand makes the idea of surrogate models less attractive considering the high-computational cost to generate the data. To address this issue, we exploit the multifidelity nature of a PDE simulation and introduce deep coregionalization, a Bayesian nonparametric autoregressive framework for efficient emulation of spatial-temporal fields. To effectively extract the output correlations in the context of multifidelity data, we develop a novel dimension reduction technique, residual principal component analysis. Our model can simultaneously capture the rich output correlations and the fidelity correlations and make high-fidelity predictions with only a small number of expensive, high-fidelity simulation samples. We show the advantages of our model in three canonical PDE models and a fluid dynamics problem. The results show that the proposed method can not only approximate simulation results with significantly less cost (by bout 10%-25%) but also further improve model accuracy.

Multi-Fidelity High-Order Gaussian Processes for Physical Simulation
Z. Wang, W. Xing, R. Kirby, S. Zhe. In International Conference on Artificial Intelligence and Statistics, PMLR, pp. 847-855. 2021.

The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs. Existing GP methods are either not scalable to high-dimensional outputs or lack effective strategies to integrate multi-fidelity examples. To address these issues, we propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations both between the outputs and between the fidelities to enhance solution estimation, and scale to large numbers of outputs. Based on a novel nonlinear coregionalization model, MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights to capture the (nonlinear) relationships across the fidelities. To improve inference efficiency and quality, we use bases decomposition to largely reduce the model parameters, and layer-wise matrix Gaussian posteriors to capture the posterior dependency and to simplify the computation. Our stochastic variational learning algorithm successfully handles millions of outputs without extra sparse approximations. We show the advantages of our method in several typical applications.

An open-source parallel code for computing the spectral fractional Laplacian on 3D complex geometry domains
M. Carlson, X. Zheng, H. Sundar, G. E. Karniadakis, R. M. Kirby. In Computer Physics Communications, Vol. 261, North-Holland, pp. 107695. 2021.

We present a spectral element algorithm and open-source code for computing the fractional Laplacian defined by the eigenfunction expansion on finite 2D/3D complex domains with both homogeneous and nonhomogeneous boundaries. We demonstrate the scalability of the spectral element algorithm on large clusters by constructing the fractional Laplacian based on computed eigenvalues and eigenfunctions using up to thousands of CPUs. To demonstrate the accuracy of this eigen-based approach for computing the factional Laplacian, we approximate the solutions of the fractional diffusion equation using the computed eigenvalues and eigenfunctions on a 2D quadrilateral, and on a 3D cubic and cylindrical domain, and compare the results with the contrived solutions to demonstrate fast convergence. Subsequently, we present simulation results for a fractional diffusion equation on a hand-shaped domain discretized with 3D hexahedra, as well as on a domain constructed from the Hanford site geometry corresponding to nonzero Dirichlet boundary conditions. Finally, we apply the algorithm to solve the surface quasi-geostrophic (SQG) equation on a 2D square with periodic boundaries. Simulation results demonstrate the accuracy, efficiency, and geometric flexibility of our algorithm and that our algorithm can capture the subtle dynamics of anomalous diffusion modeled by the fractional Laplacian on complex geometry domains. The included open-source code is the first of its kind.

Towards an Extrinsic, CG-XFEM Approach Based on Hierarchical Enrichments for Modeling Progressive Fracture
Subtitled “arXiv preprint arXiv:2104.14704,” M. K. Ballard, R. Amici, V. Shankar, L. A. Ferguson, M. Braginsky, R. M. Kirby. 2021.

We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D static elasticity with linearized strain for modeling open cracks as a salient step towards modeling progressive fracture. The implementation includes a description of the finite element model, hybrid implicit/explicit representation of enrichments, numerical integration method, and novel degree-of-freedom (DoF) enumeration algorithm. This algorithm supports an arbitrary number of enrichments within an element, while simultaneously maintaining a CG solution across elements. Additionally, our approach easily allows an implementation suitable for distributed computing systems. Enabled by the DoF enumeration algorithm, the proposed method lays the groundwork for a computational tool that efficiently models progressive fracture. To facilitate a discussion of the complex enrichment hierarchies, we develop enrichment diagrams to succinctly describe and visualize the relationships between the enrichments (and the fields they create) within an element. This also provides a unified language for discussing extrinsic XFEM methods in the literature. We compare several methods, relying on the enrichment diagrams to highlight their nuanced differences.

Residual Gaussian process: A tractable nonparametric Bayesian emulator for multi-fidelity simulations
W. W. Xing, A. A. Shah, P. Wang, S. Zhe, Q. Fu, R. M. Kirby. In Applied Mathematical Modelling, Vol. 97, Elsevier, pp. 36-56. 2021.

Challenges in multi-fidelity modelling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case.

Evaluation of GPU Volume Rendering in PyTorch Using Data-Parallel Primitives
N. Marshak, P. Grosset, A. Knoll, J. P. Ahrens, C. R. Johnson. In Eurographics Symposium on Parallel Graphics and Visualization (EGPGV), 2021.

Data-parallel programming (DPP) has attracted considerable interest from the visualization community, fostering major software initiatives such as VTK-m. However, there has been relatively little recent investigation of data-parallel APIs in higherlevel languages such as Python, which could help developers sidestep the need for low-level application programming in C++ and CUDA. Moreover, machine learning frameworks exposing data-parallel primitives, such as PyTorch and TensorFlow, have exploded in popularity, making them attractive platforms for parallel visualization and data analysis. In this work, we benchmark data-parallel primitives in PyTorch, and investigate its application to GPU volume rendering using two distinct DPP formulations: a parallel scan and reduce over the entire volume, and repeated application of data-parallel operators to an array of rays. We find that most relevant DPP primitives exhibit performance similar to a native CUDA library. However, our volume rendering implementation reveals that PyTorch is limited in expressiveness when compared to other DPP APIs. Furthermore, while render times are sufficient for an early ''proof of concept'', memory usage acutely limits scalability.

Laplacian smoothing stochastic gradient markov chain monte carlo
B. Wang, D. Zou, Q. Gu, S. J. Osher. In SIAM Journal on Scientific Computing, Vol. 43, No. 1, SIAM, pp. A26-A53. 2021.

As an important Markov chain Monte Carlo (MCMC) method, the stochastic gradient Langevin dynamics (SGLD) algorithm has achieved great success in Bayesian learning and posterior sampling. However, SGLD typically suffers from a slow convergence rate due to its large variance caused by the stochastic gradient. In order to alleviate these drawbacks, we leverage the recently developed Laplacian smoothing technique and propose a Laplacian smoothing stochastic gradient Langevin dynamics (LS-SGLD) algorithm. We prove that for sampling from both log-concave and non-log-concave densities, LS-SGLD achieves strictly smaller discretization error in 2-Wasserstein distance, although its mixing rate can be slightly slower. Experiments on both synthetic and real datasets verify our theoretical results and demonstrate the superior performance of LS-SGLD on different machine learning tasks including posterior …

Stability and Generalization of the Decentralized Stochastic Gradient Descent
Subtitled “arXiv preprint arXiv:2102.01302,” T. Sun, D. Li, B. Wang. 2021.

The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non) convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.

Robust Certification for Laplace Learning on Geometric Graphs
Subtitled “arXiv preprint arXiv:2104.10837,” M. Thorpe, B. Wang. 2021.

Graph Laplacian (GL)-based semi-supervised learning is one of the most used approaches for classifying nodes in a graph. Understanding and certifying the adversarial robustness of machine learning (ML) algorithms has attracted large amounts of attention from different research communities due to its crucial importance in many security-critical applied domains. There is great interest in the theoretical certification of adversarial robustness for popular ML algorithms. In this paper, we provide the first adversarial robust certification for the GL classifier. More precisely we quantitatively bound the difference in the classification accuracy of the GL classifier before and after an adversarial attack. Numerically, we validate our theoretical certification results and show that leveraging existing adversarial defenses for the -nearest neighbor classifier can remarkably improve the robustness of the GL classifier.

Decentralized Federated Averaging
Subtitled “arXiv preprint arXiv:2104.11375,” T. Sun, D. Li, B. Wang. 2021.

Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, which requires massive communication between server and clients in each communication. Moreover, attacking the central server can break the whole system's privacy. In this paper, we study the decentralized FedAvg with momentum (DFedAvgM), which is implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved when the loss function satisfies the P\L property. Finally, we numerically verify the efficacy of DFedAvgM.

Leveraging 31 Million Google Street View Images to Characterize Built Environments and Examine County Health Outcomes
Q. C Nguyen, J. M. Keralis, P. Dwivedi, A. E. Ng, M. Javanmardi, S. Khanna, Y. Huang, K. D. Brunisholz, A. Kumar, T. Tasdizen. In Public Health Reports, Vol. 136, No. 2, SAGE Publications, pp. 201-211. 2021.
DOI: doi.org/10.1177/0033354920968799

Objectives
Built environments can affect health, but data in many geographic areas are limited. We used a big data source to create national indicators of neighborhood quality and assess their associations with health.

Methods

We leveraged computer vision and Google Street View images accessed from December 15, 2017, through July 17, 2018, to detect features of the built environment (presence of a crosswalk, non–single-family home, single-lane roads, and visible utility wires) for 2916 US counties. We used multivariate linear regression models to determine associations between features of the built environment and county-level health outcomes (prevalence of adult obesity, prevalence of diabetes, physical inactivity, frequent physical and mental distress, poor or fair self-rated health, and premature death [in years of potential life lost]).

Results

Compared with counties with the least number of crosswalks, counties with the most crosswalks were associated with decreases of 1.3%, 2.7%, and 1.3% of adult obesity, physical inactivity, and fair or poor self-rated health, respectively, and 477 fewer years of potential life lost before age 75 (per 100 000 population). The presence of non–single-family homes was associated with lower levels of all health outcomes except for premature death. The presence of single-lane roads was associated with an increase in physical inactivity, frequent physical distress, and fair or poor self-rated health. Visible utility wires were associated with increases in adult obesity, diabetes, physical and mental distress, and fair or poor self-rated health.

Conclusions

The use of computer vision and big data image sources makes possible national studies of the built environm

Understanding a program's resiliency through error propagation
Z. Li, H. Menon, K. Mohror, P. T. Bremer, Y. Livant, V. Pascucci. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 362-373. 2021.

Aggressive technology scaling trends have worsened the transient fault problem in high-performance computing (HPC) systems. Some faults are benign, but others can lead to silent data corruption (SDC), which represents a serious problem; a fault introducing an error that is not readily detected nto an HPC simulation. Due to the insidious nature of SDCs, researchers have worked to understand their impact on applications. Previous studies have relied on expensive fault injection campaigns with uniform sampling to provide overall SDC rates, but this solution does not provide any feedback on the code regions without samples.

Blueprint: Cyberinfrastructure Center of Excellence
Subtitled “arXiv,” E. Deelman, A. Mandal, A. P. Murillo, J. Nabrzyski, V. Pascucci, R. Ricci, I. Baldin, S. Sons, L. Christopherson, C. Vardeman, R. F. da Silva, J. Wyngaard, S. Petruzza, M. Rynge, K. Vahi, W. R. Whitcup, J. Drake, E. Scott. 2021.

In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs' CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities' data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a …

Symplectic Time Integration Methods for the Material Point Method, Experiments, Analysis and Order Reduction
M. Berzins. In WCCM-ECCOMAS2020 virtual Conference, Note: Minor typographical correction in March 2024, January, 2021.

The provision of appropriate time integration methods for the Material Point Method (MPM) involves considering stability, accuracy and energy conservation. A class of methods that addresses many of these issues are the widely-used symplectic time integration methods. Such methods have good conservation properties and have the potential to achieve high accuracy. In this work we build on the work in [5] and consider high order methods for the time integration of the Material Point Method. The results of practical experiments show that while high order methods in both space and time have good accuracy initially, unless the problem has relatively little particle movement then the accuracy of the methods for later time is closer to that of low order methods. A theoretical analysis explains these results as being similar to the stage error found in Runge Kutta methods, though in this case the stage error arises from the MPM differentiations and interpolations from particles to grid and back again, particularly in cases in which there are many grid crossings.

Page 7 of 27

Start
Prev
2
3
4
5
6
7
8
9
10
11
Next
End

SCI