Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.

BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).

Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.

Martin Berzins

Parallel Computing
GPUs

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs

Valerio Pascucci

Scientific Data Management

Chris Johnson

Problem Solving Environments

Ross Whitaker

GPUs

Chuck Hansen

GPUs

Amir Arzani

Scientific machine learning
Data-driven fluid flow modeling

Funded Research Projects:

Optimal Approximation Algorithms in High Dimensions

Akil Narayan
The increasing power of modern computational hardware has enabled computer-based simulation of sophisticated mathematical models that resolve important physical phenomena in great detail. With the advent of these computational abilities has come an increased demand to include more complex physical interactions in the models, and thus an increased strain on computational resources. Modern engineering design utilizes such models, and these design problems typically involve (1) numerous tunable parameters that affect reliability, cost, and failure, (2) uncertainty about external influences manifesting as randomness in the model, and (3) epistemic ignorance involving model form uncertainty. In realistic applications, the collection of these effects leads to predictions that depend on a cumulatively high-dimensional parameter. This project focuses on development and deployment of novel, near-optimal experimental design and sampling algorithms for the accurate and efficient simulation of physical models parameterized by high-dimensional inputs. The work of this project involves the application of recently developed approximation theory results in the computational arena, targeted advances that extend theoretical mathematics for computational purposes, and the development and implementation of algorithms for large-scale computations.

The technical aspects of this project are designed to provide feasible computational algorithms and concrete mathematical guarantees for tasks in high-dimensional approximation. The three major core components for the completion of this task involve the design, implementation, and analysis of algorithms that leverage optimality characteristics of (1) random and deterministic experimental and sampling design, (2) computational algorithms for identifying efficient sampling schemes, and (3) strategies and techniques for emerging approximation paradigms such as sparse approximation and dimension reduction. A crosscutting theme is application of these methods to problems of modern interest in scientific computing. This project involves fundamental contributions to the fields of applied approximation theory and computational approximation methods through the development of applications-oriented sampling designs with provable near-optimality. Theoretical investigations of this project connect classical techniques in approximation and linear algebra with emerging algorithms in data reduction and reduced order modeling. The implementation of these algorithms will significantly enhance theoretical understanding and computational feasibility for goal-oriented design, parameter study and reduction, sparse and compressive representations, model verification and calibration, and data-driven simulations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Cyberinfrastructure Center of Excellence Pilot Study

Ewa Deelman, Valerio Pascucci, Anirban Mandal, Jaroslaw Nabrzyski, Robert Ricci
University of Southern California, Los Angeles, CA, United States

NSF's major multi-user research facilities (large facilities) are sophisticated research instruments and platforms - such as large telescopes, interferometers and distributed sensor arrays - that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) - computing, data and software systems, networking, and associated human capital - to enable broad delivery and analysis of facility-generated data. As a result of these cyber infrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades. The goal of this pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CI CoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI.

The pilot project will explore how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. The pilot project will also catalyze a key function of an eventual CI CoE - to provide a forum for exchange of experience and knowledge among CI experts. The project will also gather best practices for large facilities, with the aim of enhancing individual facility CI efforts in the broader CI context. The discussion forum and planning effort for a future CI CoE will also address training and workforce development by expanding the pool of skilled facility CI experts and forging career paths for CI professionals. The result of this work will be a strategic plan for a CI CoE that will be evaluated and refined through community interactions: workshops and direct engagement with the facilities and the broader CI community.

This project is being supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering and the Division of Emerging Frontiers in the Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Efficiency and Productivity through Artificial Intelligence

Valerio Pascucci
Efficient cyberinfrastructure (advanced computing, data, software and networking infrastructure) is a critical component of the support that NSF provides for new discoveries in science and engineering. Cyberinfrastructure is complex and traditionally requires years of human hand-tuning to fully achieve maximal performance for scientific users. We propose to introduce Artificial Intelligence (AI) as a way to automatically and quickly optimize the performance and broadest use of recent NSF-supported advanced computing resources. Through this pilot effort our ultimate aim is to enable and accelerate scientific advances in widely diverse fields such as biology, chemistry, oceanography, materials science, climate modeling, and cosmology.

As the research cyberinfrastructure grows rapidly in scale and complexity, it is essential to integrate new technologies based on Machine Learning (ML) and AI to ensure that the investments in new hardware and software components result in proportional improvements in performance and capability. This project will undertake a transformative research activity targeting: (1) scaling ML algorithms to make them easily available to the scientific community; and (2) improving cyberinfrastructure efficiency through AI-based predictive models. This technical work will be complemented and informed by a community engagement effort to jointly catalog the state of the art and identify future challenges and opportunities in enabling a new smart cyberinfrastructure.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Robust and Scalable Multi-Fidelity Algorithms for Model-Based Predictions

Akil Narayan
Modern computational models are complex in nature: accurate predictions of physics require detailed and intensive computational resources. As such, development of accurate scientific models has been the area of research emphasis in recent decades. Today’s scientific models involve largescale simulation tools, often with many interdependent components, and sometimes requiring days to complete a single simulation. Adding to this complexity is the presence of uncertainty, which is often encoded into models via parameters or random variables. Any direct approach to analyze the impact of parametric variation on such expensive models is infeasible.
One approach to circumvent this limitation is to utilize hierarchies of models, each with differing computational costs and predictive fidelities. Research in the past few years has demonstrated that intelligent allocation of resources across this ensemble of models can produce predictions with much greater accuracy than concentrating all resources in a single model. Such multi-fidelity procedures hold the potential to optimally utilize ensembles of models to make predictions.

The main components of this proposed project address optimal resource allocation and robust and scalable model reduction, generation, and learning via low-rank multi-fidelity and multilevel procedures. The overall goal is the construction of surrogate models with accuracy guarantees that can be used in design optimization, inference, and general uncertainty quantification scenarios. The tasks associated with this project involve fundamental mathematical and algorithmic advances in low-rank multi-fidelity methods. Error certificates to ensure accuracy will be developed when possible. Kernel learning techniques will be employed to explore problem-dependent low-rank structure and optimize allocation of resources. Algorithmic methods to handle heterogeneous models, data, and parameter spaces will be developed resulting in a comprehensive framework for utilizing low-rank multi-fidelity methods.

The multi-fidelity procedures devised in this project will also aid in developing novel strategies for model comparison, ranking, discrimination, and genesis. Model comparison and ranking will enable development of a comprehensive multi-fidelity pipeline to automatically learn and update model hierarchies and fidelities. Model generation using the simulation data from a multi-fidelity pipeline allows the automated construction of model emulators that can more easily be explored to detect and exploit low-rank structure.

This project will explore usage of low-rank multi-fidelity methods in two main application areas. The first area is in robust design under uncertainty, which requires robust, accurate, and efficient forward model evaluations. The second area of application is in statistical inference, requiring computationally expensive exploration of posterior distributions. This project will demonstrate the utility of low-rank multi-fidelity methods in acceleration of robust design and inferential tasks. Problems addressed by the work in this project include simulations in topology optimization, nonlocal/fractional differential equation models, modeling of multi-physics solar power receivers, and supersonic channel flow.

UINTAH + HEDGEHOG -- Hybrid Task Graph Execution Library Development for Generalized Work Loads

Martin Berzins
The Overall Objective is to develop a new Uintah runtime environment that demonstrates a flexible approach for accommodating different task execution and state management strategies consistent with a starting point:

1. Uintah uses an asynchronous manytask (AMT) approach that has been shown to strong and weak scale to 256K cores with 16K GPUs on Titan and 768K cores on Mira, through its asynchronous adaptive and over-decomposition based runtime scheduler. This scheduler works on many different and diverse architectures, from many DOE and NSF leadership class machines to Chinas Sunway Tiahulight. In addition this AMT approach when combined with mesh coarsening allows for an efficient approach to resilience.

2. HTGS/Hedgehog is a high performance single node multi-CPU/GPU tasked based system developed at NIST. Internal state management and execution strategies at the level of a single node is maintained within an explicit task graph representation. HTGS/Hedgehog has produced good competitive results on a single node.

3. Demonstrating that the integration of two different task execution paradigms and the sharing of both local and global state can occur with minimal changes to either libraries.

The objective is to integrate the HTGS/Hedgehog Task Graph library into the Uintah Runtime. This new runtime would combine the global state management and multi-nodal execution characteristics of Uintah with the local single node execution facilities of HTGS/Hedgehog. This work would demonstrate and show how state management would be managed with these two different libraries. While the two libraries share many commonalities and architectural similarities, they are distinct in the underlying implementation. Understanding and developing a robust mechanism for sharing global and local state between the two libraries along with integrating the overall resource management strategies and task execution for multiple CPU/GPU architectures is the focus of this work.

The objectives will be carried out by first conducting feasibility studies between two different applications (3D structured grid application and an imaging analysis application) followed by the prototype implementation of new Uintah Scheduler that integrates the HTGS/Hedgehog library at the nodal level. The two different applications will be used demonstrate scalability and performance on both single node and multi-node systems. Finally, the proof-of-concept prototype Uintah Scheduler implementation will be transformed into a production level system in the third year of this effort.

Portable Applications Driven Approach to Scalability on Frontera and Future Exascale Systems

Martin Berzins
The present uncertainty in computer architectures requires software design to allow applications codes to both be able to scale across 20K to 100K nodes and to be able to run portably on a range of possible nodal architectures with a variety of processor technologies being involved, ranging from i86, ARM, GPU to possibly FPGAs. At the same time it isi important to use challenging applications to validate the software solutions and to ensure that they are realistic. This project led by Professor Martin Berzins will use the Frontera system to help address and demonstrate portability for an important class of engineering applications using the Uintah software.

Uintah software employs an asynchronous many task-based approach that has proved to be exceptionally robust at enabling complex engineering applications to run at scale on a broad range of architectures. As new and different architectures require not only the ability to execute tasks asynchronously but to deal with memory hierarchies and to execute efficiently on i86 architectures to GPUs and to a broad range of other possible architectures. Uintah use an approach based upon the Kokkos portability library that makes it possible to build a simple clean loop level interface that enables the loops themselves to execute efficiently on different architectures.

The work program will first port and evaluate existing Uintah architectures to Frontera and then consider new applications that apply the Uintah methodology to areas such as unstructured mesh calculations and particle methods applied to biomedical problems. The work program described here covers the application of these ideas to Frontera. The main effort will be through other funded projects, but any funding variability will be accomodated through an adaptive appropach to the applications space.

Collaborative Research: Detecting and Preventing Covid-19 with Privacy-Preserving Decentralized Machine Learning

Bao Wang
We are facing scientific challenges caused by the COVID-19, including detecting COVID-19 accurately and preventing its spread efficiently. Cutting-edge machine learning technologies, especially modern deep learning arts, provide feasible avenues to resolve these challenges. Deep learning-based computational imaging algorithms facilitate accurate and rapid COVID-19 diagnosis; sequential modeling with recurrent neural networks or transformers enables accurate and real-time COVID-19 spread prediction. However, most existing black-box deep learning research on the COVID-19 is the alchemy of turning unstructured data into gold and based on systematic trial and error. The current deep learning-based COVID-19 research raises many untrustworthy issues, including unreliable diagnosis, data privacy sacrifice, and lack of interpretability. Lacking interpretable and reliable predictions puts substantial strains on practitioners to leverage deep learning approaches to detect and prevent COVID-19. Data privacy constraints bring us many unraveling challenges. Thus, developing trustworthy machine learning algorithms while preserving data privacy is crucial to detect and prevent COVID-19.

We are a team of researchers with different expertise and common research interests, who jointly seek to develop theoretically principled decentralized machine learning algorithms that can provide reliable predictions. Furthermore, we focus on applying these machine learning algorithms to accurately and rapidly diagnose COVID-19 patients and predict the virus spread. We propose a challenging but walkable path towards developing privacy-preserving machine learning algorithms to detect and prevent COVID-19. We will integrate our expertise synergistically to develop privacy-preserving decentralized machine learning algorithms with performance guarantees and a high-throughput and low-latency software package to accurately and rapidly detect COVID-19 and effectively prevent its spread. As such, we propose three interconnected thrusts to develop novel neural network architectures based on mathematical principles, efficient privacy-preserving decentralized optimization algorithms, algorithms for spatiotemporal data forecasting and medical image processing and analysis, and an integrated software package to assist fighting against the coronavirus. Each thrust contains multiple theoretical explorations and numerical validation.

Intellectual Merit:
The proposal's intellectual merit include: (i) development of robust and mathematically principled recurrent neural networks for accurate real-time spatio-temporal forecasting, (ii) development of novel efficient federated and decentralized machine learning algorithms with a performance guarantee, (iii) leveraging the stochastic differential equations theory to develop new privacy-preserving machine learning mechanisms, (iv) construction of new epidemiology models-principled recurrent neural networks with accurate and interpretable predictions, (v) development of trustworthy deep learning-based frameworks for COVID-19 diagnosis from multi-modal medical measurements.

Broader Impacts:
The broader impacts of this project are in applying the proposed algorithms and their analysis over a wide range of science and engineering disciplines, such as scientific and medical image analysis, epidemic forecasting, patient monitoring, and microscopic imaging. The projects shall train a diverse body of the graduate and undergraduate students at Michigan State University, the University of Kentucky, and the University of Utah through collaborative education and research activities in applied mathematics, statistics, computer science, data science, physics, and social science. The project also plans to have research activities involving under-represented students in three universities located in three states. Besides the interdisciplinary collaboration across other institutions, we also aim to establish industrial partnerships to extend the proposed project's impact. The developed software will be shared with the general public through Github.

Sub-Pilot-Scale Production of High-Value Products for U.S. Coals

Chris Johnson
The primary objectives of this project are to: 1) provide sub-pilot scale verification of lab-scale developments on the production of isotropic and mesophase coal-tar pitch (CTP) for carbon fiber production, using coals from five U.S. coal-producing regions (UT, WY, WV, AK, IL); 2) investigate the production of a high-value b-SiC byproduct using residual coal char from the tar production process, and 3) develop an extensive database and suite of tools for data analysis and economic modeling, to relate process conditions to product quality, to assess the economic viability of coals from different regions for producing specific high-value products.

The University of Utah will use a 0.5 ton/day rotary reactor to pyrolyze coals to produce tars suitable for upgrading to coal tar pitch. The same reactor technology will be used in a second stage to perform the tar upgrading to either mesophase or isotropic pitch, depending on the properties of the original coal. The University of Wyoming will spin the product pitch into carbon fiber, to assess fiber quality arising from different coals and from different processing conditions. The solid char byproduct from coal pyrolysis will be used by the University of Wyoming to produce b-SiC. The University of Utah will work with Marshall University to develop a novel database, coupled with detailed economic models and analysis tools, to provide a means for understanding correlations between coal properties, process conditions and product quality, to allow assessment of the potential economic viability of coals from different regions for producing specific high-value products. Access to these some of these computational tools will become available to the public through a web-based community portal.

This effort is a major step towards providing a low-cost carbon fiber product from coal for potential use in automotive and other important markets, and will also lead to new economic development opportunities for communities with coal-based economies.

Experimental Characterization and Modeling of Failure in Post-Buckled Composite Stiffened Panels with a Scarf Repair

Alliance for Multiscale Modeling of Electronic Materials for an Energy Efficient Army

Mike Kirby
The objective of this Alliance is to conduct fundamental research to create MSME to support development of future electronic materials and devices for the Army. The U.S. Army Research Laboratory (ARL) envisions the MultiScale multidisciplinary Modeling of Electronic materials (MSME) Collaborative Research Alliance (CRA) which will bring together government, industrial, and academic institutions to undertake the fundamental research necessary to enable the quantitative understanding of electronic materials from the smallest to the largest relevant scales.

Augmented Design Through Analysis and Visualization Facilitating Better Designs and Enhanced Designers

In Situ Feature Extraction and Visualization from Discontinuous Galerkin Based High-Order Methods

Mike Kirby
The use of simulation science as a means of scientific inquiry is increasing at a tremendous rate. The process of mathematically modeling physical phenomena, estimating key modeling parameters, numerically approximating the solution, and computationally solving the resulting algorithm has inundated the scientific and engineering worlds, allowing for rapid advances in our understanding and utilization of the world around us. The efficacy of simulation science has been, in part, due to two critical components: (1) the identification and minimization of the error budget (e.g. modeling, discretization and uncertainty errors), and equally importantly, (2) evaluation mechanisms (such as visualization) by which the investigator assimilates the data produced through simulation. The latter allows for further refinement of the simulation science process (through model correction, increased numerical resolution, or algorithm debugging, etc.) and makes possible scientific statements about the physical phenomena being investigated.

Tremendous effort has been exerted over many decades in the pursuit of numerical methods that are both flexible and accurate, hence providing sufficient fidelity to be employed in the numerical solution of a large number of models, and sufficient analysis of accuracy to allow researchers to focus their attention on model refinement and uncertainty quantification. High-order finite element methods (also known as spectral/hp element methods), using either the continuous Galerkin or discontinuous Galerkin formulation, have reached a level of sophistication that allows them to be commonly applied to a diverse set of real-life engineering problems in computational solid mechanics, fluid dynamics, acoustics and electromagnetics. Many of the physical problems of interest are, unfortunately, not steady-state --- leading to simulations that must run for a long time (days, weeks and in some cases months). Thus, in the absence of creative solutions, datasets can easily consume all available storage and networking resources. Examples of such simulations within fluid dynamics include all simulations in which the fluid is in transition or fully turbulent. With regards to ARO interests, problems in turbo-machinery and rotorcraft, where aspects of the geometry are rotating and/or sliding past one other, fall into this category. High-order finite element methods are now beginning to be used to simulate these physical systems due to their inherent ability to capture complex structures (such as vortices) with little numerical dissipation and dispersion. The transient nature of these simulations complicates the data handling (post processing requires the time history) and renders single snap-shots of the solution insufficient to understand the time-varying nature of the physics.

Objective
Our research objectives are two-fold: (1) We will generate "high-order FEM" appropriate dimensionality reduction feature extraction methods such as vortex cores which can be accomplished as part of an in situ data processing pipeline. (2) Given the exploratory nature inherent in analyzing and visualizing transient phenomena, we may specify regions of interest in an in situ fashion within a simulation field based upon the visualization objective, extract and transmit the result of working on relevant high-order FEM information to our visualization system, and then reconstruct the visualization features of interest with the cognizance of V&V.

Publications in Scientific Computing:

Page 4 of 28

Start
Prev
1
2
3
4
5
6
7
8
9
10
Next
End

Accelerating Physics Schemes in Numerical Weather Prediction Codes and Preserving Positivity in the Physics-Dynamics coupling
Timbwaoga Aime Judicael (TAJO) Ouermi. University of Utah, 2022.

The Materials Commons Data Repository
G. Tarcea, B. Puchala, T. Berman, G. Scorzelli, V. Pascucci, M, Taufer, J. Allison. In 2022 IEEE 18th International Conference on e-Science (e-Science), pp. 405--406. 2022.
DOI: 10.1109/eScience55777.2022.00060

Repositories are increasingly used for publishing and sharing scientific data. The Materials Commons is a data repository that follows the FAIR (Findable, Accessible, Inter-operable, Reusable) principles. We demonstrate the challenges with FAIR and how Materials Commons solves them. We also discuss the Nationals Science Data Fabric (NSDF) [1], a project that is democratizing data access, and show how Materials Commons with the NSDF software stack accelerates data access and scientific research.

NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivering
J. Luettgau, C.R. Kirkpatrick, G. Scorzelli, V. Pascucci, G. Tarcea, M. Taufer. 2022.

Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is often hard if not impossible, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we develop a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata collections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations

Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks
Subtitled “arXiv preprint arXiv:2210.12669,” S. Li, M. Penwarden, R.M. Kirby, S. Zhe. 2022.

Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, applying different PINNs to solve the equation in each subdomain and aligning the solution at the interface of the subdomains. Hence, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of the multi-domain PINNs is sensitive to the choice of the interface conditions for solution alignment. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine the optimal interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit models. The first one applies to the entire training procedure, and online updates a Gaussian process (GP) reward surrogate that given the PDE parameters and interface conditions predicts the solution error. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP surrogate for each phase to enable different condition selections at the two stages so as to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.

Batch Multi-Fidelity Active Learning with Budget Constraints
Subtitled “arXiv:2210.12704v1,” S. Li, J.M. Phillips, X. Yu, R.M. Kirby, S. Zhe. 2022.

Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks
T. Sun, D. Li, B. Wang. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022), October, 2022.

Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU networks approximation whose samples are generated from a Markov decision process. Our established theory shows that if the width of the deep neural network is large enough, the adaptive TD using neural network approximation can find the (optimal) value function with high probabilities under the same iteration complexity as TD in general cases. Furthermore, we show that the adaptive TD using neural network approximation, with the same width and searching area, can achieve theoretical acceleration when the stochastic semigradients decay fast.

Quadrature Sampling of Parametric Models with Bi-fidelity Boosting
Subtitled “arXiv:2209.05705v1,” N. Cheng, O.A. Malik, Y. Xu, S. Becker, A. Doostan, A. Narayan. 2022.

Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.

Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems
Subtitled “arXiv:2209.05662v1,” O.A. Malik, Y. Xu, N. Cheng, S. Becker, A. Doostan, A. Narayan. 2022.

Approximate solutions to large least squares problems can be computed efficiently using leverage score-based row-sketches, but directly computing the leverage scores, or sampling according to them with naive methods, still requires an expensive manipulation and processing of the design matrix. In this paper we develop efficient leverage score-based sampling methods for matrices with certain Kronecker product-type structure; in particular we consider matrices that are monotone lower column subsets of Kronecker product matrices. Our discussion is general, encompassing least squares problems on infinite domains, in which case matrices formally have infinitely many rows. We briefly survey leverage score-based sampling guarantees from the numerical linear algebra and approximation theory communities, and follow this with efficient algorithms for sampling when the design matrix has Kronecker-type structure. Our numerical examples confirm that sketches based on exact leverage score sampling for our class of structured matrices achieve superior residual compared to approximate leverage score sampling methods.

Uncertainty quantification for ecological models with random parameters
J.R. Reimer, F.R. Adler, K.M. Golden, A. Narayan. In Ecology Letters, Wiley, pp. 1--13. 2022.

There is often considerable uncertainty in parameters in ecological models. This uncertainty can be incorporated into models by treating parameters as random variables with distributions, rather than fixed quantities. Recent advances in uncertainty quantification methods, such as polynomial chaos approaches, allow for the analysis of models with random parameters. We introduce these methods with a motivating case study of sea ice algal blooms in heterogeneous environments. We compare Monte Carlo methods with polynomial chaos techniques to help understand the dynamics of an algal bloom model with random parameters. Modelling key parameters in the algal bloom model as random variables changes the timing, intensity and overall productivity of the modelled bloom. The computational efficiency of polynomial chaos methods provides a promising avenue for the broader inclusion of parametric uncertainty in ecological models, leading to improved model predictions and synthesis between models and data.

Democratizing Science Through Advanced Cyberinfrastructure
M. Parashar. In Computer, IEEE, 2022.

Democratizing access to cyberinfrastructure is essential to democratizing science. This article explores knowledge, technical, and social barriers to accessing and using cyberinfrastructure and explores approaches to addresses them. It also highlights recent activities and investments at the National Science Foundation that implement some of these approaches.

Adaptive and Implicit Regularization for Matrix Completion
Subtitled “arXiv preprint arXiv:2208.05640,” Z. Li, T. Sun, H. Wang, B. Wang. 2022.

The explicit low-rank regularization, e.g., nuclear norm regularization, has been widely used in imaging sciences. However, it has been found that implicit regularization outperforms explicit ones in various image processing tasks. Another issue is that the fixed explicit regularization limits the applicability to broad images since different images favor different features captured by different explicit regularizations. As such, this paper proposes a new adaptive and implicit low-rank regularization that captures the low-rank prior dynamically from the training data. The core of our new adaptive and implicit low-rank regularization is parameterizing the Laplacian matrix in the Dirichlet energy-based regularization, which we call the regularization AIR. Theoretically, we show that the adaptive regularization of AIR enhances the implicit regularization and vanishes at the end of training. We validate AIR’s effectiveness on various benchmark tasks, indicating that the AIR is particularly favorable for the scenarios when the missing entries are non-uniform. The code can be found at https://github.com/lizhemin15/AIR-Net.

Research Reproducibility
M. Parashar, M.A. Heroux, V. Stodde. In Computer, Vol. 55, No. 8, IEEE, pp. 16--18. August, 2022.

Reproducibility has a foundational role in ensuring robust and trustworthy research, but achieving reproducibility can be challenging. This theme issue explores these challenges along with research and implementations across communities addressing them, with the goal of understanding the impact of existing solutions and synthesizing lessons learned and emerging best practices.

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
Subtitled “arXiv preprint arXiv:2208.00579,” T. Nguyen, R.G. Baraniuk, R.M. Kirby, S.J. Osher, B. Wang. 2022.

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emphmomentum transformer, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy.

Assembling Portable In-Situ Workflow from Heterogeneous Components using Data Reorganization
B. Zhang, P. Subedi, P. E. Davis, F. Rizzi, K. Teranishi, M. Parashar. In 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 41-50. 2022.
DOI: 10.1109/CCGrid54584.2022.00013

Heterogeneous computing is becoming common in the HPC world. The fast-changing hardware landscape is pushing programmers and developers to rely on performance-portable programming models to rewrite old and legacy applications and develop new ones. While this approach is suitable for individual applications, outstanding challenges still remain when multiple applications are combined into complex workflows. One critical difficulty is the exchange of data between communicating applications where performance constraints imposed by heterogeneous hardware advantage different data layouts. We attempt to solve this problem by exploring asynchronous data layout conversions for applications requiring different memory access patterns for shared data. We implement the proposed solution within the DataSpaces data staging service, extending it to support heterogeneous application workflows across a broad spectrum of programming models. In addition, we integrate heterogeneous DataSpaces with the Kokkos programming model and propose the Kokkos Staging Space as an extension of the Kokkos data abstraction. This new abstraction enables us to express data on a virtual shared space for multiple Kokkos applications, thus guaranteeing the portability of each application when assembling them into an efficient heterogeneous workflow. We present performance results for the Kokkos Staging Space using a synthetic workflow emulator and three different scenarios representing access frequency and use patterns in shared data. The results show that the Kokkos Staging Space is a superior solution in terms of time-to-solution and scalability compared to existing file-based Kokkos data abstractions for inter-application data exchange.

Transforming science through cyberinfrastructure
M. Parashar, A. Friedlander, E. Gianchandani,, M. Martonosi. In Communications of the ACM, Vol. 65, No. 8, pp. 30–32. 2022.

NSF's vision for the U.S. cyberinfrastructure ecosystem for science and engineering in the 21^st century.

Adaptive Self-supervision Algorithms for Physics-informed Neural Networks
Subtitled “arXiv:2207.04084,” S. Subramanian, R.M. Kirby, M.W. Mahoney, A. Gholami. 2022.

Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel adaptive collocation scheme which progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on-par or slightly better than vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open sourced.

A scalable adaptive-matrix SPMV for heterogeneous architectures
H. D. Tran, M. Fernando, K. Saurabh, B. Ganapathysubramanian, R. M. Kirby, H. Sundar. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 13--24. 2022.
DOI: 10.1109/IPDPS53621.2022.00011

In most computational codes, the core computational kernel is the Sparse Matrix-Vector product (SpMV) that enables specialized linear algebra libraries like PETSc to be used, especially in the distributed memory setting. However, optimizing SpMvperformance and scalability at all levels of a modern heterogeneous architecture can be challenging as it is characterized by irregular memory access. This work presents a hybrid approach (HyMV) for evaluating SpMV for matrices arising from PDE discretization schemes such as the finite element method (FEM). The approach enables localized structured memory access that provides improved performance and scalability. Additionally, it simplifies the programmability and portability on different architectures. The developed HyMV approach enables efficient parallelization using MPI, SIMD, OpenMP, and CUDA with minimum programming effort. We present a detailed comparison of HyMV with the two traditional approaches in computational code, matrix-assembled and matrix-free approaches, for structured and unstructured meshes. Our results demonstrate that the HyMV approach achieves excellent scalability and outperforms both approaches, e.g., achieving average speedups of 11x for matrix setup, 1.7x for SpMV with structured meshes, 3.6x for SpMV with unstructured meshes, and 7.5x for GPU SpMV.

Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations
M. Dorier, Z. Wang, U. Ayachit, S. Snyder, R. Ross, M. Parashar. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 538-548. 2022.
DOI: 10.1109/IPDPS53621.2022.00059

In situ analysis and visualization have grown increasingly popular for enabling direct access to data from high-performance computing (HPC) simulations. As a simulation progresses and interesting physical phenomena emerge, however, the data produced may become increasingly complex, and users may need to dynamically change the type and scale of in situ analysis tasks being carried out and consequently adapt the amount of resources allocated to such tasks. To date, none of the production in situ analysis frameworks offer such an elasticity feature, and for good reason: the assumption that the number of processes could vary during run time would force developers to rethink software and algorithms at every level of the in situ analysis stack. In this paper we present Colza, a data staging service with elastic in situ visualization capabilities. Colza relies on the widely used ParaView Catalyst in situ visualization framework and enables elasticity by replacing MPI with a custom collective communication library based on the Mochi suite of libraries. To the best of our knowledge, this work is the first to enable elastic in situ visualization capabilities for HPC applications on top of existing production analysis tools.

Nonparametric Embeddings of Sparse High-Order Interaction Events
Z. Wang, Y. Xu, C. Tillinghast, S. Li, A. Narayan, S. Zhe. In Proceedings of the 39 th International Conference on Machine Learning, PLMR, pp. 23237-23253. 2022.

High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, eg Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound) of the sparse ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.

Infinite-Fidelity Coregionalization for Physical Simulation
Subtitled “arXiv:2207.00678,” S. Li, Z Wang, R.M. Kirby, S. Zhe. 2022.

Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within continuous, infinite fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities, which can be even higher than the fidelities of training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.

Page 4 of 28

Start
Prev
1
2
3
4
5
6
7
8
9
10
Next
End

SCI