P. Agrawal, R. T. Whitaker, S. Y. Elhabian. Learning Deep Features for Shape Correspondence with Domain Invariance, Subtitled arXiv preprint arXiv:2102.10493, 2021.
Correspondence-based shape models are key to various medical imaging applications that rely on a statistical analysis of anatomies. Such shape models are expected to represent consistent anatomical features across the population for population-specific shape statistics. Early approaches for correspondence placement rely on nearest neighbor search for simpler anatomies. Coordinate transformations for shape correspondence hold promise to address the increasing anatomical complexities. Nonetheless, due to the inherent shape-level geometric complexity and population-level shape variation, the coordinate-wise correspondence often does not translate to the anatomical correspondence. An alternative, group-wise approach for correspondence placement explicitly models the trade-off between geometric description and the population's statistical compactness. However, these models achieve limited success in resolving nonlinear shape correspondence. Recent works have addressed this limitation by adopting an application-specific notion of correspondence through lifting positional data to a higher dimensional feature space. However, they heavily rely on manual expertise to create domain-specific features and consistent landmarks. This paper proposes an automated feature learning approach, using deep convolutional neural networks to extract correspondence-friendly features from shape ensembles. Further, an unsupervised domain adaptation scheme is introduced to augment the pretrained geometric features with new anatomies. Results on anatomical datasets of human scapula, femur, and pelvis bones demonstrate that …
A. Bagherinezhad, M. Young, Bei Wang, M. Parvania. Spatio-Temporal Visualization of Interdependent Battery Bus Transit and Power Distribution Systems, In IEEE PES Innovative Smart Grid Technologies Conference(ISGT), IEEE, 2021.
The high penetration of transportation electrification and its associated charging requirements magnify the interdependency of the transportation and power distribution systems. The emergent interdependency requires that system operators fully understand the status of both systems. To this end,a visualization tool is presented to illustrate the inter dependency of battery bus transit and power distribution systems and the associated components. The tool aims at monitoring components from both systems, such as the locations of electric buses, the state of charge of batteries, the price of electricity, voltage, current,and active/reactive power flow. The results showcase the success of the visualization tool in monitoring the bus transit and power distribution components to determine a reliable cost-effective scheme for spatio-temporal charging of electric buses.
M. K. Ballard, R. Amici, V. Shankar, L. A. Ferguson, M. Braginsky, R. M. Kirby. Towards an Extrinsic, CG-XFEM Approach Based on Hierarchical Enrichments for Modeling Progressive Fracture, Subtitled arXiv preprint arXiv:2104.14704, 2021.
We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D static elasticity with linearized strain for modeling open cracks as a salient step towards modeling progressive fracture. The implementation includes a description of the finite element model, hybrid implicit/explicit representation of enrichments, numerical integration method, and novel degree-of-freedom (DoF) enumeration algorithm. This algorithm supports an arbitrary number of enrichments within an element, while simultaneously maintaining a CG solution across elements. Additionally, our approach easily allows an implementation suitable for distributed computing systems. Enabled by the DoF enumeration algorithm, the proposed method lays the groundwork for a computational tool that efficiently models progressive fracture. To facilitate a discussion of the complex enrichment hierarchies, we develop enrichment diagrams to succinctly describe and visualize the relationships between the enrichments (and the fields they create) within an element. This also provides a unified language for discussing extrinsic XFEM methods in the literature. We compare several methods, relying on the enrichment diagrams to highlight their nuanced differences.
M. Berzins. Symplectic Time Integration Methods for the Material Point Method, Experiments, Analysis and Order Reduction, In WCCM-ECCOMAS2020 virtual Conference, January, 2021.
The provision of appropriate time integration methods for the Material Point Method (MPM) involves considering stability, accuracy and energy conservation. A class of methods that addresses many of these issues are the widely-used symplectic time integration methods. Such methods have good conservation properties and have the potential to achieve high accuracy. In this work we build on the work in  and consider high order methods for the time integration of the Material Point Method. The results of practical experiments show that while high order methods in both space and time have good accuracy initially, unless the problem has relatively little particle movement then the accuracy of the methods for later time is closer to that of low order methods. A theoretical analysis explains these results as being similar to the stage error found in Runge Kutta methods, though in this case the stage error arises from the MPM differentiations and interpolations from particles to grid and back again, particularly in cases in which there are many grid crossings.
Electrocardiographic forward problems are crucial components for noninvasive electrocardiographic imaging (ECGI) that compute torso potentials from cardiac source measurements. Forward problems have few sources of error as they are physically well posed and supported by mature numerical and computational techniques. However, the residual errors reported from experimental validation studies between forward computed and measured torso signals remain surprisingly high.
To test the hypothesis that incomplete cardiac source sampling, especially above the atrioventricular (AV) plane is a major contributor to forward solution errors.
We used a modified Langendorff preparation suspended in a human-shaped electrolytic torso-tank and a novel pericardiac-cage recording array to thoroughly sample the cardiac potentials. With this carefully controlled experimental preparation, we minimized possible sources of error, including geometric error and torso inhomogeneities. We progressively removed recorded signals from above the atrioventricular plane to determine how the forward-computed torso-tank potentials were affected by incomplete source sampling.
We studied 240 beats total recorded from three different activation sequence types (sinus, and posterior and anterior left-ventricular free-wall pacing) in each of two experiments. With complete sampling by the cage electrodes, all correlation metrics between computed and measured torso-tank potentials were above 0.93 (maximum 0.99). The mean root-mean-squared error across all beat types was also low, less than or equal to 0.10 mV. A precipitous drop in forward solution accuracy was observed when we included only cage measurements below the AV plane.
First, our forward computed potentials using complete cardiac source measurements set a benchmark for similar studies. Second, this study validates the importance of complete cardiac source sampling above the AV plane to produce accurate forward computed torso potentials. Testing ECGI systems and techniques with these more complete and highly accurate datasets will improve inverse techniques and noninvasive detection of cardiac electrical abnormalities.
R. Bhalodia, S. Elhabian, L. Kavan, R. Whitaker. Leveraging Unsupervised Image Registration for Discovery of Landmark Shape Descriptor, In Medical Image Analysis, Elsevier, pp. 102157. 2021.
In current biological and medical research, statistical shape modeling (SSM) provides an essential framework for the characterization of anatomy/morphology. Such analysis is often driven by the identification of a relatively small number of geometrically consistent features found across the samples of a population. These features can subsequently provide information about the population shape variation. Dense correspondence models can provide ease of computation and yield an interpretable low-dimensional shape descriptor when followed by dimensionality reduction. However, automatic methods for obtaining such correspondences usually require image segmentation followed by significant preprocessing, which is taxing in terms of both computation as well as human resources. In many cases, the segmentation and subsequent processing require manual guidance and anatomy specific domain expertise. This paper proposes a self-supervised deep learning approach for discovering landmarks from images that can directly be used as a shape descriptor for subsequent analysis. We use landmark-driven image registration as the primary task to force the neural network to discover landmarks that register the images well. We also propose a regularization term that allows for robust optimization of the neural network and ensures that the landmarks uniformly span the image domain. The proposed method circumvents segmentation and preprocessing and directly produces a usable shape descriptor using just 2D or 3D images. In addition, we also propose two variants on the training loss function that allows for prior shape information to be integrated into the model. We apply this framework on several 2D and 3D datasets to obtain their shape descriptors. We analyze these shape descriptors in their efficacy of capturing shape information by performing different shape-driven applications depending on the data ranging from shape clustering to severity prediction to outcome diagnosis.
S. R. Black, A. Janson, M. Mahan, J. Anderson, C. R. Butson. Identification of Deep Brain Stimulation Targets for Neuropathic Pain After Spinal Cord Injury Using Localized Increases in White Matter Fiber Cross‐Section, In Neuromodulation: Technology at the Neural Interface, John Wiley & Sons, Inc., 2021.
The spinal cord injury (SCI) patient population is overwhelmingly affected by neuropathic pain (NP), a secondary condition for which therapeutic options are limited and have a low degree of efficacy. The objective of this study was to identify novel deep brain stimulation (DBS) targets that may theoretically benefit those with NP in the SCI patient population. We hypothesize that localized changes in white matter identified in SCI subjects with NP compared to those without NP could be used to develop an evidence‐based approach to DBS target identification.
K. M. Campbell, H. Dai, Z. Su, M. Bauer, P. T. Fletcher, S. C. Joshi. Structural Connectome Atlas Construction in the Space of Riemannian Metrics, Subtitled arXiv, 2021.
The structural connectome is often represented by fiber bundles generated from various types of tractography. We propose a method of analyzing connectomes by representing them as a Riemannian metric, thereby viewing them as points in an infinite-dimensional manifold. After equipping this space with a natural metric structure, the Ebin metric, weapply object-oriented statistical analysis to define an atlas as the Fŕechet mean of a population of Riemannian metrics. We demonstrate connectome registration and atlas formation using connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project.
3D Model of Cell Migration and Proliferation in a Tissue Scaffold, In Biophysical Journal, Vol. 120, No. 3, Elsevier, pp. 265a. 2021.S. H. Campbell, T. Bidone.
Tissue scaffolds restore tissue functionality without the limitations of transplants. However, successful tissue growth depends on the interplay between scaffold properties and cell activities. It has been previously reported that scaffold porosity and Young's modulus affect cell migration and tissue generation. However, how the geometrical and mechanical properties of a scaffold exactly interplay with cell processes remain poorly understood and are essential for successful tissue growth. We developed a 3D computational model that simulates cell migration and proliferation on a scaffold. The model generates an adjustable 3D porous scaffold environment with a defined pore size and Young modulus. Cells are treated as explicit spherical particles comparable in size to bone-marrow cells and are initially seeded randomly throughout the scaffold. Cells can create adhesions, proliferate, and independently migrate across pores in a random walk. Cell adhesions during migration follow the molecular-clutch mechanism, where traction force from the cells against the scaffold stiffness reinforces adhesions lifetime up to a threshold. We used the model to test how variations in cell proliferation rate, scaffold Young's modulus, and porosity affect cell migration speed. At a low proliferation rate (1 x 10−7 s−1), the spread of cell speeds is larger than at a high replication rate (1 x 10−6 s−1). A biphasic relation between Young's modulus and cell speed is also observed reflecting the molecular-clutch mechanism at the level of individual adhesions. These observations are consistent with previous reports regarding fibroblast migration on collagen-glycosaminoglycan scaffolds. Additionally, our model shows that similar cell diameters and pore diameter induces a crowding effect decreasing cell speed. The results from our study provide important insights about biophysical mechanisms that govern cell motility on scaffolds with different properties for tissue engineering applications.
M. Carlson, X. Zheng, H. Sundar, G. E. Karniadakis, R. M. Kirby. An open-source parallel code for computing the spectral fractional Laplacian on 3D complex geometry domains, In Computer Physics Communications, Vol. 261, North-Holland, pp. 107695. 2021.
We present a spectral element algorithm and open-source code for computing the fractional Laplacian defined by the eigenfunction expansion on finite 2D/3D complex domains with both homogeneous and nonhomogeneous boundaries. We demonstrate the scalability of the spectral element algorithm on large clusters by constructing the fractional Laplacian based on computed eigenvalues and eigenfunctions using up to thousands of CPUs. To demonstrate the accuracy of this eigen-based approach for computing the factional Laplacian, we approximate the solutions of the fractional diffusion equation using the computed eigenvalues and eigenfunctions on a 2D quadrilateral, and on a 3D cubic and cylindrical domain, and compare the results with the contrived solutions to demonstrate fast convergence. Subsequently, we present simulation results for a fractional diffusion equation on a hand-shaped domain discretized with 3D hexahedra, as well as on a domain constructed from the Hanford site geometry corresponding to nonzero Dirichlet boundary conditions. Finally, we apply the algorithm to solve the surface quasi-geostrophic (SQG) equation on a 2D square with periodic boundaries. Simulation results demonstrate the accuracy, efficiency, and geometric flexibility of our algorithm and that our algorithm can capture the subtle dynamics of anomalous diffusion modeled by the fractional Laplacian on complex geometry domains. The included open-source code is the first of its kind.
Computational Model of E-cadherin Clustering under Cortical Tension, In Biophysical Journal, Vol. 120, No. 3, Elsevier, pp. 236a. 2021.Y. Chen, C. McNabb, T. Bidone.
E-cadherins are adhesion proteins that play a critical role in the formation of cell-cell junctions for several physiological processes, including tissue development and homeostasis. The formation of E-cadherin clusters involves extracellular trans-and cis-associations between cadherin ectodomains and stabilization through intracellular coupling with the contractile actomyosin cortex. The dynamic remodeling of cell-cell junctions largely depends on cortical tension, but previous modeling frameworks did not incorporate this effect. In order to gain insights into the effects of cortical tension on the dynamic properties of E-cadherin clusters, here we developed a computational model based on Brownian dynamics. The model considers individual cadherins as explicit point particles undergoing cycles of lateral diffusion on two parallel surfaces that mimic the membrane of neighboring cells. E-cadherins transit between …
Y. Chen, L. Ji, A. Narayan, Z. Xu. L1-based reduced over collocation and hyper reduction for steady state and time-dependent nonlinear equations, In Journal of Scientific Computing, Vol. 87, No. 1, Springer US, pp. 1--21. 2021.
The task of repeatedly solving parametrized partial differential equations (pPDEs) in optimization, control, or interactive applications makes it imperative to design highly efficient and equally accurate surrogate models. The reduced basis method (RBM) presents itself as such an option. Accompanied by a mathematically rigorous error estimator, RBM carefully constructs a low-dimensional subspace of the parameter-induced high fidelity solution manifold on which an approximate solution is computed. It can improve efficiency by several orders of magnitudes leveraging an offline-online decomposition procedure. However this decomposition, usually implemented with aid from the empirical interpolation method (EIM) for nonlinear and/or parametric-nonaffine PDEs, can be challenging to implement, or results in severely degraded online efficiency. In this paper, we augment and extend the EIM approach as a direct solver, as opposed to an assistant, for solving nonlinear pPDEs on the reduced level. The resulting method, called Reduced Over-Collocation method (ROC), is stable and capable of avoiding efficiency degradation exhibited in traditional applications of EIM. Two critical ingredients of the scheme are collocation at about twice as many locations as the dimension of the reduced approximation space, and an efficient L1-norm-based error indicator for the strategic selection of the parameter values whose snapshots span the reduced approximation space. Together, these two ingredients ensure that the proposed L1-ROC scheme is both offline- and online-efficient. A distinctive feature is that the efficiency degradation appearing in alternative RBM approaches that utilize EIM for nonlinear and nonaffine problems is circumvented, both in the offline and online stages. Numerical tests on different families of time-dependent and steady-state nonlinear problems demonstrate the high efficiency and accuracy of L1-ROC and its superior stability performance.
J. Chilleri, Y. He, D. Bedrov, R. M. Kirby. Optimal allocation of computational resources based on Gaussian process: Application to molecular dynamics simulations, In Computational Materials Science, Vol. 188, Elsevier, pp. 110178. 2021.
Simulation models have been utilized in a wide range of real-world applications for behavior predictions of complex physical systems or material designs of large structures. While extensive simulation is mathematically preferable, external limitations such as available resources are often necessary considerations. With a fixed computational resource (i.e., total simulation time), we propose a Gaussian process-based numerical optimization framework for optimal time allocation over simulations at different locations, so that a surrogate model with uncertainty estimation can be constructed to approximate the full simulation. The proposed framework is demonstrated first via two synthetic problems, and later using a real test case of a glass-forming system with divergent dynamic relaxations where a Gaussian process is constructed to estimate the diffusivity and its uncertainty with respect to the temperature.
D. Dai, Y. Epshteyn, A. Narayan. Hyperbolicity-Preserving and Well-Balanced Stochastic Galerkin Method for Two-Dimensional Shallow Water Equations, In SIAM Journal on Scientific Computing, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A929-A952. 2021.
Stochastic Galerkin formulations of the two-dimensional shallow water systems parameterized with random variables may lose hyperbolicity, and hence change the nature of the original model. In this work, we present a hyperbolicity-preserving stochastic Galerkin formulation by carefully selecting the polynomial chaos approximations to the nonlinear terms of , and in the shallow water equations. We derive a sufficient condition to preserve the hyperbolicity of the stochastic Galerkin system which requires only a finite collection of positivity conditions on the stochastic water height at selected quadrature points in parameter space. Based on our theoretical results for the stochastic Galerkin formulation, we develop a corresponding well-balanced hyperbolicity-preserving central-upwind scheme. We demonstrate the accuracy and the robustness of the new scheme on several challenging numerical tests.
E. Deelman, A. Mandal, A. P. Murillo, J. Nabrzyski, V. Pascucci, R. Ricci, I. Baldin, S. Sons, L. Christopherson, C. Vardeman, R. F. da Silva, J. Wyngaard, S. Petruzza, M. Rynge, K. Vahi, W. R. Whitcup, J. Drake, E. Scott. Blueprint: Cyberinfrastructure Center of Excellence, Subtitled arXiv, 2021.
In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs' CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities' data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a …
M. D. Foote, P. E. Dennison, P. R. Sullivan, K. B. O'Neill, A. K. Thorpe, D. R. Thompson, D. H. Cusworth, R. Duren, S. Joshi. Impact of scene-specific enhancement spectra on matched filter greenhouse gas retrievals from imaging spectroscopy, In Remote Sensing of Environment, Vol. 264, Elsevier, pp. 112574. 2021.
Matched filter techniques have been widely used for retrieval of greenhouse gas enhancements from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enhancement estimation has remained largely unaltered since the introduction of quantitative matched filter retrievals. The magnitude of retrieved methane and carbon dioxide enhancements, and thereby integrated mass enhancements (IME) and estimated flux of point-source emitters, is heavily dependent on this target spectrum. Current standard use of molecular absorption coefficients to create unit enhancement target spectra does not account for absorption by background concentrations of greenhouse gases, solar and sensor geometry, or atmospheric water vapor absorption. We introduce geometric and atmospheric parameters into the generation of scene-specific unit enhancement spectra to provide target spectra that are compatible with all greenhouse gas retrieval matched filter techniques. Specifically, we use radiative transfer modeling to model four parameters that are expected to change between scenes: solar zenith angle, column water vapor, ground elevation, and sensor altitude. These parameter values are well defined, with low variation within a single scene. A benchmark dataset consisting of ten AVIRIS-NG airborne imaging spectrometer scenes was used to compare IME retrieved using a matched filter algorithm. For methane plumes, IME resulting from use of standard, generic enhancement spectra varied from −22 to +28.7% compared to scene-specific enhancement spectra. Due to differences in spectral shape between the generic and scene-specific enhancement spectra, differences in methane plume IME were linked to surface spectral characteristics in addition to geometric and atmospheric parameters. IME differences were much larger for carbon dioxide plumes, with generic enhancement spectra producing integrated mass enhancements −76.1 to −48.1% compared to scene-specific enhancement spectra. Fluxes calculated from these integrated enhancements would vary by the same percentages, assuming equivalent wind conditions. Methane and carbon dioxide IME were most sensitive to changes in solar zenith angle and ground elevation. We introduce an interpolation approach that can efficiently generate scene-specific unit enhancement spectra for given sets of parameters. Scene-specific target spectra can improve confidence in greenhouse gas retrievals and flux estimates across collections of scenes with diverse geometric and atmospheric conditions.
W. W. Good, B. Zenger, J. A. Bergquist, L. C. Rupp, K. K. Gillette, M. A.F. Gsell, G. Plank, R. S. MacLeod. Quantifying the spatiotemporal influence of acute myocardial ischemia on volumetric conduction velocity, In Journal of Electrocardiology, Vol. 66, Churchill Livingstone, pp. 86-94. 2021.
Acute myocardial ischemia occurs when coronary perfusion to the heart is inadequate, which can perturb the highly organized electrical activation of the heart and can result in adverse cardiac events including sudden cardiac death. Ischemia is known to influence the ST and repolarization phases of the ECG, but it also has a marked effect on propagation (QRS); however, studies investigating propagation during ischemia have been limited.
Objective: In this study, we have used whole heart simulations parameterized with large animal experiments to validate three techniques (two from the literature and one novel) for estimating epicardial and volumetric conduction velocity (CV). Methods: We used an eikonal-based simulation model to generate ground truth activation sequences with prescribed CVs. Using the sampling density achieved experimentally we examined the accuracy with which we could reconstruct the wavefront, and then examined the robustness of three CV estimation techniques to reconstruction related error. We examined a triangulation-based, inverse-gradient-based, and streamline-based techniques for estimating CV cross the surface and within the volume of the heart. Results: The reconstructed activation times agreed closely with simulated values, with 50-70% of the volumetric nodes and 97-99% of the epicardial nodes were within 1 ms of the ground truth. We found close agreement between the CVs calculated using reconstructed versus ground truth activation times, with differences in the median estimated CV on the order of 3-5% volumetrically and 1-2% superficially, regardless of what technique was used. Conclusion: Our results indicate that the wavefront reconstruction and CV estimation techniques are accurate, allowing us to examine changes in propagation induced by experimental interventions such as acute ischemia, ectopic pacing, or drugs. Significance: We implemented, validated, and compared the performance of a number of CV estimation techniques. The CV estimation techniques implemented in this study produce accurate, high-resolution CV fields that can be used to study propagation in the heart experimentally and clinically.
A. A. Gooch, S. Petruzza, A. Gyulassy, G. Scorzelli, V. Pascucci, L. Rantham, W. Adcock, C. Coopmans.
Lessons learned towards the immediate delivery of massive aerial imagery to farmers and crop consultants, In Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping VI, Vol. 11747, International Society for Optics and Photonics, pp. 22 -- 34. 2021.
In this paper, we document lessons learned from using ViSOAR Ag Explorer™ in the fields of Arkansas and Utah in the 2018-2020 growing seasons. Our insights come from creating software with fast reading and writing of 2D aerial image mosaics for platform-agnostic collaborative analytics and visualization. We currently enable stitching in the field on a laptop without the need for an internet connection. The full resolution result is then available for instant streaming visualization and analytics via Python scripting. While our software, ViSOAR Ag Explorer™ removes the time and labor software bottleneck in processing large aerial surveys, enabling a cost-effective process to deliver actionable information to farmers, we learned valuable lessons with regard to the acquisition, storage, viewing, analysis, and planning stages of aerial data surveys. Additionally, with the ultimate goal of stitching thousands of images in minutes on board a UAV at the time of data capture, we performed preliminary tests for on-board, real-time stitching and analysis on USU AggieAir sUAS using lightweight computational resources. This system is able to create a 2D map while flying and allow interactive exploration of the full resolution data as soon as the platform has landed or has access to a network. This capability further speeds up the assessment process on the field and opens opportunities for new real-time photogrammetry applications. Flying and imaging over 1500-2000 acres per week provides up-to-date maps that give crop consultants a much broader scope of the field in general as well as providing a better view into planting and field preparation than could be observed from field level. Ultimately, our software and hardware could provide a much better understanding of weed presence and intensity or lack thereof.
J. K. Holmen, D. Sahasrabudhe, M. Berzins. A Heterogeneous MPI+PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems, In Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC21), ACM, 2021.
Asynchronous many-task runtime systems and MPI+X hybrid parallelism approaches have shown promise for helping manage theincreasing complexity of nodes in current and emerging high performance computing (HPC) systems, including those for exascale. Theincreasing architectural diversity, however, poses challenges for large legacy runtime systems emphasizing broad support for majorHPC systems. Performance portability layers (PPL) have shown promise for helping manage this diversity. This paper describes aheterogeneous MPI+PPL task scheduling approach for combining these promising solutions with additional consideration for parallelthird party libraries facing similar challenges to help prepare such a runtime for the diverse heterogeneous systems accompanyingexascale computing. This approach is demonstrated using a heterogeneous MPI+Kokkos task scheduler and the accompanyingportable abstractions  implemented in the Uintah Computational Framework, an asynchronous many-task runtime system, withadditional consideration for hypre, a parallel third party library. Results are shown for two challenging problems executing workloadsrepresentative of typical Uintah applications. These results show performance improvements up to 4.4x when using this schedulerand the accompanying portable abstractions  to port a previously MPI-Only problem to Kokkos::OpenMP and Kokkos::CUDA toimprove multi-socket, multi-device node use. Good strong-scaling to 1,024 NVIDIA V100 GPUs and 512 IBM POWER9 processor arealso shown using MPI+Kokkos::OpenMP+Kokkos::CUDA at scale