Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.

BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).

Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.

Martin Berzins

Parallel Computing
GPUs

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs

Valerio Pascucci

Scientific Data Management

Chris Johnson

Problem Solving Environments

Ross Whitaker

GPUs

Chuck Hansen

GPUs

Amir Arzani

Scientific machine learning
Data-driven fluid flow modeling

Funded Research Projects:

Optimal Approximation Algorithms in High Dimensions

Akil Narayan
The increasing power of modern computational hardware has enabled computer-based simulation of sophisticated mathematical models that resolve important physical phenomena in great detail. With the advent of these computational abilities has come an increased demand to include more complex physical interactions in the models, and thus an increased strain on computational resources. Modern engineering design utilizes such models, and these design problems typically involve (1) numerous tunable parameters that affect reliability, cost, and failure, (2) uncertainty about external influences manifesting as randomness in the model, and (3) epistemic ignorance involving model form uncertainty. In realistic applications, the collection of these effects leads to predictions that depend on a cumulatively high-dimensional parameter. This project focuses on development and deployment of novel, near-optimal experimental design and sampling algorithms for the accurate and efficient simulation of physical models parameterized by high-dimensional inputs. The work of this project involves the application of recently developed approximation theory results in the computational arena, targeted advances that extend theoretical mathematics for computational purposes, and the development and implementation of algorithms for large-scale computations.

The technical aspects of this project are designed to provide feasible computational algorithms and concrete mathematical guarantees for tasks in high-dimensional approximation. The three major core components for the completion of this task involve the design, implementation, and analysis of algorithms that leverage optimality characteristics of (1) random and deterministic experimental and sampling design, (2) computational algorithms for identifying efficient sampling schemes, and (3) strategies and techniques for emerging approximation paradigms such as sparse approximation and dimension reduction. A crosscutting theme is application of these methods to problems of modern interest in scientific computing. This project involves fundamental contributions to the fields of applied approximation theory and computational approximation methods through the development of applications-oriented sampling designs with provable near-optimality. Theoretical investigations of this project connect classical techniques in approximation and linear algebra with emerging algorithms in data reduction and reduced order modeling. The implementation of these algorithms will significantly enhance theoretical understanding and computational feasibility for goal-oriented design, parameter study and reduction, sparse and compressive representations, model verification and calibration, and data-driven simulations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Cyberinfrastructure Center of Excellence Pilot Study

Ewa Deelman, Valerio Pascucci, Anirban Mandal, Jaroslaw Nabrzyski, Robert Ricci
University of Southern California, Los Angeles, CA, United States

NSF's major multi-user research facilities (large facilities) are sophisticated research instruments and platforms - such as large telescopes, interferometers and distributed sensor arrays - that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) - computing, data and software systems, networking, and associated human capital - to enable broad delivery and analysis of facility-generated data. As a result of these cyber infrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades. The goal of this pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CI CoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI.

The pilot project will explore how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. The pilot project will also catalyze a key function of an eventual CI CoE - to provide a forum for exchange of experience and knowledge among CI experts. The project will also gather best practices for large facilities, with the aim of enhancing individual facility CI efforts in the broader CI context. The discussion forum and planning effort for a future CI CoE will also address training and workforce development by expanding the pool of skilled facility CI experts and forging career paths for CI professionals. The result of this work will be a strategic plan for a CI CoE that will be evaluated and refined through community interactions: workshops and direct engagement with the facilities and the broader CI community.

This project is being supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering and the Division of Emerging Frontiers in the Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Efficiency and Productivity through Artificial Intelligence

Valerio Pascucci
Efficient cyberinfrastructure (advanced computing, data, software and networking infrastructure) is a critical component of the support that NSF provides for new discoveries in science and engineering. Cyberinfrastructure is complex and traditionally requires years of human hand-tuning to fully achieve maximal performance for scientific users. We propose to introduce Artificial Intelligence (AI) as a way to automatically and quickly optimize the performance and broadest use of recent NSF-supported advanced computing resources. Through this pilot effort our ultimate aim is to enable and accelerate scientific advances in widely diverse fields such as biology, chemistry, oceanography, materials science, climate modeling, and cosmology.

As the research cyberinfrastructure grows rapidly in scale and complexity, it is essential to integrate new technologies based on Machine Learning (ML) and AI to ensure that the investments in new hardware and software components result in proportional improvements in performance and capability. This project will undertake a transformative research activity targeting: (1) scaling ML algorithms to make them easily available to the scientific community; and (2) improving cyberinfrastructure efficiency through AI-based predictive models. This technical work will be complemented and informed by a community engagement effort to jointly catalog the state of the art and identify future challenges and opportunities in enabling a new smart cyberinfrastructure.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Robust and Scalable Multi-Fidelity Algorithms for Model-Based Predictions

Akil Narayan
Modern computational models are complex in nature: accurate predictions of physics require detailed and intensive computational resources. As such, development of accurate scientific models has been the area of research emphasis in recent decades. Today’s scientific models involve largescale simulation tools, often with many interdependent components, and sometimes requiring days to complete a single simulation. Adding to this complexity is the presence of uncertainty, which is often encoded into models via parameters or random variables. Any direct approach to analyze the impact of parametric variation on such expensive models is infeasible.
One approach to circumvent this limitation is to utilize hierarchies of models, each with differing computational costs and predictive fidelities. Research in the past few years has demonstrated that intelligent allocation of resources across this ensemble of models can produce predictions with much greater accuracy than concentrating all resources in a single model. Such multi-fidelity procedures hold the potential to optimally utilize ensembles of models to make predictions.

The main components of this proposed project address optimal resource allocation and robust and scalable model reduction, generation, and learning via low-rank multi-fidelity and multilevel procedures. The overall goal is the construction of surrogate models with accuracy guarantees that can be used in design optimization, inference, and general uncertainty quantification scenarios. The tasks associated with this project involve fundamental mathematical and algorithmic advances in low-rank multi-fidelity methods. Error certificates to ensure accuracy will be developed when possible. Kernel learning techniques will be employed to explore problem-dependent low-rank structure and optimize allocation of resources. Algorithmic methods to handle heterogeneous models, data, and parameter spaces will be developed resulting in a comprehensive framework for utilizing low-rank multi-fidelity methods.

The multi-fidelity procedures devised in this project will also aid in developing novel strategies for model comparison, ranking, discrimination, and genesis. Model comparison and ranking will enable development of a comprehensive multi-fidelity pipeline to automatically learn and update model hierarchies and fidelities. Model generation using the simulation data from a multi-fidelity pipeline allows the automated construction of model emulators that can more easily be explored to detect and exploit low-rank structure.

This project will explore usage of low-rank multi-fidelity methods in two main application areas. The first area is in robust design under uncertainty, which requires robust, accurate, and efficient forward model evaluations. The second area of application is in statistical inference, requiring computationally expensive exploration of posterior distributions. This project will demonstrate the utility of low-rank multi-fidelity methods in acceleration of robust design and inferential tasks. Problems addressed by the work in this project include simulations in topology optimization, nonlocal/fractional differential equation models, modeling of multi-physics solar power receivers, and supersonic channel flow.

UINTAH + HEDGEHOG -- Hybrid Task Graph Execution Library Development for Generalized Work Loads

Martin Berzins
The Overall Objective is to develop a new Uintah runtime environment that demonstrates a flexible approach for accommodating different task execution and state management strategies consistent with a starting point:

1. Uintah uses an asynchronous manytask (AMT) approach that has been shown to strong and weak scale to 256K cores with 16K GPUs on Titan and 768K cores on Mira, through its asynchronous adaptive and over-decomposition based runtime scheduler. This scheduler works on many different and diverse architectures, from many DOE and NSF leadership class machines to Chinas Sunway Tiahulight. In addition this AMT approach when combined with mesh coarsening allows for an efficient approach to resilience.

2. HTGS/Hedgehog is a high performance single node multi-CPU/GPU tasked based system developed at NIST. Internal state management and execution strategies at the level of a single node is maintained within an explicit task graph representation. HTGS/Hedgehog has produced good competitive results on a single node.

3. Demonstrating that the integration of two different task execution paradigms and the sharing of both local and global state can occur with minimal changes to either libraries.

The objective is to integrate the HTGS/Hedgehog Task Graph library into the Uintah Runtime. This new runtime would combine the global state management and multi-nodal execution characteristics of Uintah with the local single node execution facilities of HTGS/Hedgehog. This work would demonstrate and show how state management would be managed with these two different libraries. While the two libraries share many commonalities and architectural similarities, they are distinct in the underlying implementation. Understanding and developing a robust mechanism for sharing global and local state between the two libraries along with integrating the overall resource management strategies and task execution for multiple CPU/GPU architectures is the focus of this work.

The objectives will be carried out by first conducting feasibility studies between two different applications (3D structured grid application and an imaging analysis application) followed by the prototype implementation of new Uintah Scheduler that integrates the HTGS/Hedgehog library at the nodal level. The two different applications will be used demonstrate scalability and performance on both single node and multi-node systems. Finally, the proof-of-concept prototype Uintah Scheduler implementation will be transformed into a production level system in the third year of this effort.

Portable Applications Driven Approach to Scalability on Frontera and Future Exascale Systems

Martin Berzins
The present uncertainty in computer architectures requires software design to allow applications codes to both be able to scale across 20K to 100K nodes and to be able to run portably on a range of possible nodal architectures with a variety of processor technologies being involved, ranging from i86, ARM, GPU to possibly FPGAs. At the same time it isi important to use challenging applications to validate the software solutions and to ensure that they are realistic. This project led by Professor Martin Berzins will use the Frontera system to help address and demonstrate portability for an important class of engineering applications using the Uintah software.

Uintah software employs an asynchronous many task-based approach that has proved to be exceptionally robust at enabling complex engineering applications to run at scale on a broad range of architectures. As new and different architectures require not only the ability to execute tasks asynchronously but to deal with memory hierarchies and to execute efficiently on i86 architectures to GPUs and to a broad range of other possible architectures. Uintah use an approach based upon the Kokkos portability library that makes it possible to build a simple clean loop level interface that enables the loops themselves to execute efficiently on different architectures.

The work program will first port and evaluate existing Uintah architectures to Frontera and then consider new applications that apply the Uintah methodology to areas such as unstructured mesh calculations and particle methods applied to biomedical problems. The work program described here covers the application of these ideas to Frontera. The main effort will be through other funded projects, but any funding variability will be accomodated through an adaptive appropach to the applications space.

Collaborative Research: Detecting and Preventing Covid-19 with Privacy-Preserving Decentralized Machine Learning

Bao Wang
We are facing scientific challenges caused by the COVID-19, including detecting COVID-19 accurately and preventing its spread efficiently. Cutting-edge machine learning technologies, especially modern deep learning arts, provide feasible avenues to resolve these challenges. Deep learning-based computational imaging algorithms facilitate accurate and rapid COVID-19 diagnosis; sequential modeling with recurrent neural networks or transformers enables accurate and real-time COVID-19 spread prediction. However, most existing black-box deep learning research on the COVID-19 is the alchemy of turning unstructured data into gold and based on systematic trial and error. The current deep learning-based COVID-19 research raises many untrustworthy issues, including unreliable diagnosis, data privacy sacrifice, and lack of interpretability. Lacking interpretable and reliable predictions puts substantial strains on practitioners to leverage deep learning approaches to detect and prevent COVID-19. Data privacy constraints bring us many unraveling challenges. Thus, developing trustworthy machine learning algorithms while preserving data privacy is crucial to detect and prevent COVID-19.

We are a team of researchers with different expertise and common research interests, who jointly seek to develop theoretically principled decentralized machine learning algorithms that can provide reliable predictions. Furthermore, we focus on applying these machine learning algorithms to accurately and rapidly diagnose COVID-19 patients and predict the virus spread. We propose a challenging but walkable path towards developing privacy-preserving machine learning algorithms to detect and prevent COVID-19. We will integrate our expertise synergistically to develop privacy-preserving decentralized machine learning algorithms with performance guarantees and a high-throughput and low-latency software package to accurately and rapidly detect COVID-19 and effectively prevent its spread. As such, we propose three interconnected thrusts to develop novel neural network architectures based on mathematical principles, efficient privacy-preserving decentralized optimization algorithms, algorithms for spatiotemporal data forecasting and medical image processing and analysis, and an integrated software package to assist fighting against the coronavirus. Each thrust contains multiple theoretical explorations and numerical validation.

Intellectual Merit:
The proposal's intellectual merit include: (i) development of robust and mathematically principled recurrent neural networks for accurate real-time spatio-temporal forecasting, (ii) development of novel efficient federated and decentralized machine learning algorithms with a performance guarantee, (iii) leveraging the stochastic differential equations theory to develop new privacy-preserving machine learning mechanisms, (iv) construction of new epidemiology models-principled recurrent neural networks with accurate and interpretable predictions, (v) development of trustworthy deep learning-based frameworks for COVID-19 diagnosis from multi-modal medical measurements.

Broader Impacts:
The broader impacts of this project are in applying the proposed algorithms and their analysis over a wide range of science and engineering disciplines, such as scientific and medical image analysis, epidemic forecasting, patient monitoring, and microscopic imaging. The projects shall train a diverse body of the graduate and undergraduate students at Michigan State University, the University of Kentucky, and the University of Utah through collaborative education and research activities in applied mathematics, statistics, computer science, data science, physics, and social science. The project also plans to have research activities involving under-represented students in three universities located in three states. Besides the interdisciplinary collaboration across other institutions, we also aim to establish industrial partnerships to extend the proposed project's impact. The developed software will be shared with the general public through Github.

Sub-Pilot-Scale Production of High-Value Products for U.S. Coals

Chris Johnson
The primary objectives of this project are to: 1) provide sub-pilot scale verification of lab-scale developments on the production of isotropic and mesophase coal-tar pitch (CTP) for carbon fiber production, using coals from five U.S. coal-producing regions (UT, WY, WV, AK, IL); 2) investigate the production of a high-value b-SiC byproduct using residual coal char from the tar production process, and 3) develop an extensive database and suite of tools for data analysis and economic modeling, to relate process conditions to product quality, to assess the economic viability of coals from different regions for producing specific high-value products.

The University of Utah will use a 0.5 ton/day rotary reactor to pyrolyze coals to produce tars suitable for upgrading to coal tar pitch. The same reactor technology will be used in a second stage to perform the tar upgrading to either mesophase or isotropic pitch, depending on the properties of the original coal. The University of Wyoming will spin the product pitch into carbon fiber, to assess fiber quality arising from different coals and from different processing conditions. The solid char byproduct from coal pyrolysis will be used by the University of Wyoming to produce b-SiC. The University of Utah will work with Marshall University to develop a novel database, coupled with detailed economic models and analysis tools, to provide a means for understanding correlations between coal properties, process conditions and product quality, to allow assessment of the potential economic viability of coals from different regions for producing specific high-value products. Access to these some of these computational tools will become available to the public through a web-based community portal.

This effort is a major step towards providing a low-cost carbon fiber product from coal for potential use in automotive and other important markets, and will also lead to new economic development opportunities for communities with coal-based economies.

Experimental Characterization and Modeling of Failure in Post-Buckled Composite Stiffened Panels with a Scarf Repair

Alliance for Multiscale Modeling of Electronic Materials for an Energy Efficient Army

Mike Kirby
The objective of this Alliance is to conduct fundamental research to create MSME to support development of future electronic materials and devices for the Army. The U.S. Army Research Laboratory (ARL) envisions the MultiScale multidisciplinary Modeling of Electronic materials (MSME) Collaborative Research Alliance (CRA) which will bring together government, industrial, and academic institutions to undertake the fundamental research necessary to enable the quantitative understanding of electronic materials from the smallest to the largest relevant scales.

Augmented Design Through Analysis and Visualization Facilitating Better Designs and Enhanced Designers

In Situ Feature Extraction and Visualization from Discontinuous Galerkin Based High-Order Methods

Mike Kirby
The use of simulation science as a means of scientific inquiry is increasing at a tremendous rate. The process of mathematically modeling physical phenomena, estimating key modeling parameters, numerically approximating the solution, and computationally solving the resulting algorithm has inundated the scientific and engineering worlds, allowing for rapid advances in our understanding and utilization of the world around us. The efficacy of simulation science has been, in part, due to two critical components: (1) the identification and minimization of the error budget (e.g. modeling, discretization and uncertainty errors), and equally importantly, (2) evaluation mechanisms (such as visualization) by which the investigator assimilates the data produced through simulation. The latter allows for further refinement of the simulation science process (through model correction, increased numerical resolution, or algorithm debugging, etc.) and makes possible scientific statements about the physical phenomena being investigated.

Tremendous effort has been exerted over many decades in the pursuit of numerical methods that are both flexible and accurate, hence providing sufficient fidelity to be employed in the numerical solution of a large number of models, and sufficient analysis of accuracy to allow researchers to focus their attention on model refinement and uncertainty quantification. High-order finite element methods (also known as spectral/hp element methods), using either the continuous Galerkin or discontinuous Galerkin formulation, have reached a level of sophistication that allows them to be commonly applied to a diverse set of real-life engineering problems in computational solid mechanics, fluid dynamics, acoustics and electromagnetics. Many of the physical problems of interest are, unfortunately, not steady-state --- leading to simulations that must run for a long time (days, weeks and in some cases months). Thus, in the absence of creative solutions, datasets can easily consume all available storage and networking resources. Examples of such simulations within fluid dynamics include all simulations in which the fluid is in transition or fully turbulent. With regards to ARO interests, problems in turbo-machinery and rotorcraft, where aspects of the geometry are rotating and/or sliding past one other, fall into this category. High-order finite element methods are now beginning to be used to simulate these physical systems due to their inherent ability to capture complex structures (such as vortices) with little numerical dissipation and dispersion. The transient nature of these simulations complicates the data handling (post processing requires the time history) and renders single snap-shots of the solution insufficient to understand the time-varying nature of the physics.

Objective
Our research objectives are two-fold: (1) We will generate "high-order FEM" appropriate dimensionality reduction feature extraction methods such as vortex cores which can be accomplished as part of an in situ data processing pipeline. (2) Given the exploratory nature inherent in analyzing and visualizing transient phenomena, we may specify regions of interest in an in situ fashion within a simulation field based upon the visualization objective, extract and transmit the result of working on relevant high-order FEM information to our visualization system, and then reconstruct the visualization features of interest with the cognizance of V&V.

Publications in Scientific Computing:

Page 5 of 28

Start
Prev
1
2
3
4
5
6
7
8
9
10
Next
End

Adaptive Random Walk Gradient Descent for Decentralized Optimization
T. Sun, D. Li, B. Wang. In Proceedings of the 39th International Conference on Machine Learning, 2022.

In this paper, we study the adaptive step size random walk gradient descent with momentum for decentralized optimization, in which the training samples are drawn dependently with each other. We establish theoretical convergence rates of the adaptive step size random walk gradient descent with momentum for both convex and nonconvex settings. In particular, we prove that adaptive random walk algorithms perform as well as the nonadaptive method for dependent data in general cases but achieve acceleration when the stochastic gradients are “sparse”. Moreover, we study the zeroth-order version of adaptive random walk gradient descent and provide corresponding convergence results. All assumptions used in this paper are mild and general, making our results applicable to many machine learning problems.

NSDF-FUSE: A Testbed for Studying Object Storage via FUSE File Systems
P. Olaya, J. Luettgau, N. Zhou, J. Lofstead, G. Scorzelli, V. Pascucci, M. Taufer. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, pp. 277–278. 2022.
ISBN: 9781450391993
DOI: 10.1145/3502181.3533709

This work presents NSDF-FUSE, a testbed for evaluating settings and performance of FUSE-based file systems on top of S3-compatible object storage; the testbed is part of a suite of services from the National Science Data Fabric (NSDF) project (an NSF-funded project that is delivering cyberinfrastructures for data scientists). We demonstrate how NSDF-FUSE can be deployed to evaluate eight different mapping packages that mount S3-compatible object storage to a file system, as well as six data patterns representing different I/O operations on two cloud platforms. NSDF-FUSE is open-source and can be easily extended to run with other software mapping packages and different cloud platforms.

Decomposing Temporal High-Order Interactions via Latent ODEs
S. Li, R.M. Kirby, S. Zhe. In Proceedings of the 39 th International Conference on Machine Learning, 2022.

High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, finegrained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.

Bayesian Continuous-Time Tucker Decomposition
S. Fang, A. Narayan, R.M. Kirby, S. Zhe. In Proceedings of the 39 th International Conference on Machine Learning, 2022.

Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition (BCTT). We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message-passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.

Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization
Subtitled “arXiv:2205.03681,” V. Keshavarzzadeh, R.M. Kirby, A. Narayan. 2022.

Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information about the parameters and the information from the observations via likelihood evaluations are incorporated into the inference process. In this paper, we adopt a similar viewpoint with a slightly different numerical procedure from standard inference approaches to provide insight about the localized behavior of unknown underlying parameters. We present a variational inference approach which mainly incorporates the observation data in a point-wise manner, i.e. we invert a limited number of observation data leveraging the gradient information of the forward map with respect to parameters, and find true individual samples of the latent parameters when the forward map is noise-free and one-to-one. For statistical calculations (as the ultimate goal in simulations), a large number of samples are generated from a trained neural network which serves as a transport map from the prior to posterior latent parameters. Our neural network machinery, developed as part of the inference framework and referred to as Neural Net Kernels (NNK), is based on hierarchical (deep) kernels which provide greater flexibility for training compared to standard neural networks. We showcase the effectiveness of our inference procedure in identifying bimodal and irregular distributions compared to a number of approaches including Markov Chain Monte Carlo sampling approaches and a Bayesian neural network approach.

Advancing Reproducibility in Parallel and Distributed Systems Research
M. Parashar. In Computer, Vol. 55, No. 5, pp. 4--5. 2022.
DOI: 10.1109/MC.2022.3158156

This installment of Computer’s series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Parallel and Distributed Systems.

Porting Uintah to Heterogeneous Systems,
J.K. Holmen, D. Sahasrabudhe, M. Berzins. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22) Best Paper Award, ACM, 2022.

The Uintah Computational Framework is being prepared to make portable use of forthcoming exascale systems, initially the DOE Aurora system through the Aurora Early Science Program. This paper describes the evolution of Uintah to be ready for such architectures. A key part of this preparation has been the adoption of the Kokkos performance portability layer in Uintah. The sheer size of the Uintah codebase has made it imperative to have a representative benchmark. The design of this benchmark and the use of Kokkos within it is discussed. This paper complements recent work with additional details and new scaling studies run 24x further than earlier studies. Results are shown for two benchmarks executing workloads representative of typical Uintah applications. These results demonstrate single-source portability across the DOE Summit and NSF Frontera systems with good strong-scaling characteristics. The challenge of extending this approach to anticipated exascale systems is also considered.

Integrating atomistic simulations and machine learning to design multi-principal element alloys with superior elastic modulus
M. Grant, M. R. Kunz, K. Iyer, L. I. Held, T. Tasdizen, J. A. Aguiar, P. P. Dholabhai. In Journal of Materials Research, Springer International Publishing, pp. 1--16. 2022.

Multi-principal element, high entropy alloys (HEAs) are an emerging class of materials that have found applications across the board. Owing to the multitude of possible candidate alloys, exploration and compositional design of HEAs for targeted applications is challenging since it necessitates a rational approach to identify compositions exhibiting enriched performance. Here, we report an innovative framework that integrates molecular dynamics and machine learning to explore a large chemical-configurational space for evaluating elastic modulus of equiatomic and non-equiatomic HEAs along primary crystallographic directions. Vital thermodynamic properties and machine learning features have been incorporated to establish fundamental relationships correlating Young’s modulus with Gibbs free energy, valence electron concentration, and atomic size difference. In HEAs, as the number of elements increases …

Organizing Large Data Sets for Efficient Analyses on HPC Systems
J. Gu, P. Davis, G. Eisenhauer, W. Godoy, A. Huebl, S. Klasky, M. Parashar, N. Podhorszki, F. Poeschel, J. Vay, L. Wan, R. Wang, K. Wu. In Journal of Physics: Conference Series, Vol. 2224, No. 1, IOP Publishing, pp. 012042. 2022.

Upcoming exascale applications could introduce significant data management challenges due to their large sizes, dynamic work distribution, and involvement of accelerators such as graphical processing units, GPUs. In this work, we explore the performance of reading and writing operations involving one such scientific application on two different supercomputers. Our tests showed that the Adaptable Input and Output System, ADIOS, was able to achieve speeds over 1TB/s, a significant fraction of the peak I/O performance on Summit. We also demonstrated the querying functionality in ADIOS could effectively support common selective data analysis operations, such as conditional histograms. In tests, this query mechanism was able to reduce the execution time by a factor of five. More importantly, ADIOS data management framework allows us to achieve these performance improvements with only a minimal amount …

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs
Subtitled “arXiv preprint arXiv:2204.08621,” J. Baker, H. Xia, Y. Wang, E. Cherkaev, A. Narayan, L. Chen, J. Xin, A. L. Bertozzi, S. J. Osher, B. Wang. 2022.

Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.

ENO-Based High-Order Data-Bounded and Constrained Positivity-Preserving Interpolation
Subtitled “https://arxiv.org/abs/2204.06168,” T.A.J. Ouermi, R.M. Kirby, M. Berzins. In Numerical Algorithms, 2022.

A number of key scientific computing applications that are based upon tensor-product grid constructions, such as numerical weather prediction (NWP) and combustion simulations, require property-preserving interpolation. Essentially Non-Oscillatory (ENO) interpolation is a classic example of such interpolation schemes. In the aforementioned application areas, property preservation often manifests itself as a requirement for either data boundedness or positivity preservation. For example, in NWP, one may have to interpolate between the grid on which the dynamics is calculated to a grid on which the physics is calculated (and back). Interpolating density or other key physical quantities without accounting for property preservation may lead to negative values that are nonphysical and result in inaccurate representations and/or interpretations of the physical data. Property-preserving interpolation is straightforward when used in the context of low-order numerical simulation methods. High-order property-preserving interpolation is, however, nontrivial, especially in the case where the interpolation points are not equispaced. In this paper, we demonstrate that it is possible to construct high-order interpolation methods that ensure either data boundedness or constrained positivity preservation. A novel feature of the algorithm is that the positivity-preserving interpolant is constrained; that is, the amount by which it exceeds the data values may be strictly controlled. The algorithm we have developed comes with theoretical estimates that provide sufficient conditions for data boundedness and constrained positivity preservation. We demonstrate the application of our algorithm on a collection of 1D and 2D numerical examples, and show that in all cases property preservation is respected.

Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures
Subtitled “arXiv:2204.04273,” J.D. Hogue, R.M. Kirby, A. Narayan. 2022.

Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.

Portable, Scalable Approaches For Improving Asynchronous Many-Task Runtime Node Use
John Holmen. School of Computing, University of Utah, 2022.

This research addresses node-level scalability, portability, and heterogeneous computing challenges facing asynchronous many-task (AMT) runtime systems. These challenges have arisen due to increasing socket/core/thread counts and diversity among supported architectures on current and emerging high-performance computing (HPC) systems. This places greater emphasis on thread scalability and simultaneous use of diverse architectures to maximize node use and is complicated by architecture-specific programming models.

To reduce the exposure of application developers to such challenges, AMT programming models have emerged to offer a runtime-based solution. These models overdecompose a problem into many fine-grained tasks to be scheduled and executed by an underlying runtime to improve node-level concurrency. However, task execution granularity challenges remain, and it is unclear where and how shared memory programming models should be used within an AMT model to improve node use. This research aims to ease these design decisions with consideration for performance portability layers (PPLs), which provide a single interface to multiple shared memory programming models.

The contribution of this research is the design of a task scheduling approach for portably improving node use when extending AMT runtime systems to many-core and heterogeneous HPC systems with shared memory programming models. The success of this approach is shown through the portable adoption of a performance portability layer, Kokkos, within Uintah, a representative AMT runtime system. The resulting task scheduler enables the scheduling and execution of portable, fine-grained tasks across processors and accelerators simultaneously with flexible control over task execution granularity. A collection of experiments on current many-core and heterogeneous HPC systems are used to validate this approach and inform design recommendations. Among resulting recommendations are approaches for easing the adoption of a heterogeneous MPI+PPL task scheduling approach in an asynchronous many-task runtime system and furthermore to ease indirect adoption of a performance portability layer in large legacy codebases.

Convex Optimization-Based Structure-Preserving Filter For Multidimensional Finite Element Simulations
Subtitled “arXiv preprint arXiv:2203.09748,” V. Zala, A. Narayan, R.M. Kirby. 2022.

In simulation sciences, it is desirable to capture the real-world problem features as accurately as possible. Methods popular for scientific simulations such as the finite element method (FEM) and finite volume method (FVM) use piecewise polynomials to approximate various characteristics of a problem, such as the concentration profile and the temperature distribution across the domain. Polynomials are prone to creating artifacts such as Gibbs oscillations while capturing a complex profile. An efficient and accurate approach must be applied to deal with such inconsistencies in order to obtain accurate simulations. This often entails dealing with negative values for the concentration of chemicals, exceeding a percentage value over 100, and other such problems. We consider these inconsistencies in the context of partial differential equations (PDEs). We propose an innovative filter based on convex optimization to deal with the inconsistencies observed in polynomial-based simulations. In two or three spatial dimensions, additional complexities are involved in solving the problems related to structure preservation. We present the construction and application of a structure-preserving filter with a focus on multidimensional PDEs. Methods used such as the Barycentric interpolation for polynomial evaluation at arbitrary points in the domain and an optimized root-finder to identify points of interest improve the filter efficiency, usability, and robustness. Lastly, we present numerical experiments in 2D and 3D using discontinuous Galerkin formulation and demonstrate the filter's efficacy to preserve the desired structure. As a real-world application …

Reinventing High Performance Computing: Challenges and Opportunities
Subtitled “UUSCI-2022-001,” D. Reed, D. Gannon, J. Dongarra. University of Utah, 2022.

The world of computing is in rapid transition, now dominated by a world of smartphones and cloud services, with profound implications for the future of advanced scientific computing. Simply put, high-performance computing (HPC) is at an important inflection point. For the last 60 years, the world's fastest supercomputers were almost exclusively produced in the United States on behalf of scientific research in the national laboratories. Change is now in the wind. While costs now stretch the limits of U.S. government funding for advanced computing, Japan and China are now leaders in the bespoke HPC systems funded by government mandates. Meanwhile, the global semiconductor shortage and political battles surrounding fabrication facilities affect everyone. However, another, perhaps even deeper, fundamental change has occurred. The major cloud vendors have invested in global networks of massive scale systems that dwarf today's HPC systems. Driven by the computing demands of AI, these cloud systems are increasingly built using custom semiconductors, reducing the financial leverage of traditional computing vendors. These cloud systems are now breaking barriers in game playing and computer vision, reshaping how we think about the nature of scientific computation. Building the next generation of leading edge HPC systems will require rethinking many fundamentals and historical approaches by embracing end-to-end co-design; custom hardware configurations and packaging; large-scale prototyping, as was common thirty years ago; and collaborative partnerships with the dominant computing ecosystem companies, smartphone, and cloud computing vendors.

Learning POD of Complex Dynamics Using Heavy-ball Neural ODEs
Subtitled “arXiv:2202.12373,” J. Baker, E. Cherkaev, A. Narayan, B. Wang. 2022.

Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. In this paper, we leverage the recently proposed heavy-ball neural ODEs (HBNODEs) [Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots generated from solving full order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-term dependencies effectively from sequential observations and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von Kármán Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.

Computational Error Estimation for The Material Point Method
M. Berzins. In Computational Particle Mechanics, Springer, 2022.
DOI: https://doi.org/10.1007/s40571-022-00530-5

A common feature of many methods in computational mechanics is that there is often a way of estimating the error in the computed solution. The situation for computational mechanics codes based upon the Material Point Method is very different in that there has been comparatively little work on computable error estimates for these methods. This work is concerned with introducing such an approach for the Material Point Method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. There is then a need to estimate these errors computationally through computable estimates of the different errors in the material point method. Estimates of the different spatial errors in the Material Point Method are constructed based upon nodal derivatives of the different physical variables in MPM. These derivatives are then estimated using standard difference approximations calculated on the background mesh. The use of these estimates of the spatial error makes it possible to measure the growth of errors over time. A number of computational experiments are used to illustrate the performance of the computed error estimates. As the key feature of the approach is the calculation of derivatives on the regularly spaced background mesh, the extension to calculating derivatives and hence to error estimates for higher dimensional problems is clearly possible.

A Stieltjes algorithm for generating multivariate orthogonal polynomials
Subtitled “arXiv preprint arXiv:2202.04843,” Z. Liu, A. Narayan. 2022.

Orthogonal polynomials of several variables have a vector-valued three-term recurrence relation, much like the corresponding one-dimensional relation. This relation requires only knowledge of certain recurrence matrices, and allows simple and stable evaluation of multivariate orthogonal polynomials. In the univariate case, various algorithms can evaluate the recurrence coefficients given the ability to compute polynomial moments, but such a procedure is absent in multiple dimensions. We present a new Multivariate Stieltjes (MS) algorithm that fills this gap in the multivariate case, allowing computation of recurrence matrices assuming moments are available. The algorithm is essentially explicit in two and three dimensions, but requires the numerical solution to a non-convex problem in more than three dimensions. Compared to direct Gram-Schmidt-type orthogonalization, we demonstrate on several examples in up to three dimensions that the MS algorithm is far more stable, and allows accurate computation of orthogonal bases in the multivariate setting, in contrast to direct orthogonalization approaches.

Energy conservation and accuracy of some MPM formulations
M. Berzins. In Computational Particle Mechanics, 2022.
DOI: 10.1007/s40571-021-00457-3

The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as time integration accuracy and energy conservation. The traditional MPM time integration methods are often based upon the symplectic Euler method or staggered central differences. This raises the question of how to best ensure energy conservation in explicit time integration for MPM. Two approaches are used here, one is to extend the Symplectic Euler method (Cromer Euler) to provide better energy conservation and the second is to use a potentially more accurate symplectic methods, namely the widely-used Stormer-Verlet Method. The Stormer-Verlet method is shown to have locally third order time accuracy of energy conservation in time, in contrast to the second order accuracy in energy conservation of the symplectic Euler methods that are used in many MPM calculations. It is shown that there is an extension to the Symplectic Euler stress-last method that provides better energy conservation that is comparable with the Stormer-Verlet method. This extension is referred to as TRGIMP and also has third order accuracy in energy conservation. When the interactions between space and time errors are studied it is seen that spatial errors may dominate in computed quantities such as displacement and velocity. This connection between the local errors in space and time is made explicit mathematically and explains the observed results that displacement and velocity errors are very similar for both methods. The observed and theoretically predicted third-order energy conservation accuracy and computational costs are demonstrated on a standard MPM test example.

AMM: Adaptive Multilinear Meshes
Subtitled “arXiv:2007.15219,” H. Bhatia, D. Hoang, N. Morrical, V. Pascucci, P.T. Bremer, P. Lindstrom. 2021.

Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g., through compression, or adapting data resolution, e.g., using spatial hierarchies. Recent research suggests that combining the two approaches, i.e., adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes

Page 5 of 28

Start
Prev
1
2
3
4
5
6
7
8
9
10
Next
End

SCI