W. W. Good, B. Zenger, J. A. Bergquist, L. C. Rupp, K. Gillett, N. Angel, D. Chou, G. Plank, R. S. MacLeod.
Combining endocardial mapping and electrocardiographic imaging (ECGI) for improving PVC localization: A feasibility study, In Journal of Electrocardiology, 2021.
Accurate reconstruction of cardiac activation wavefronts is crucial for clinical diagnosis, management, and treatment of cardiac arrhythmias. Furthermore, reconstruction of activation profiles within the intramural myocardium has long been impossible because electrical mapping was only performed on the endocardial surface. Recent advancements in electrocardiographic imaging (ECGI) have made endocardial and epicardial activation mapping possible. We propose a novel approach to use both endocardial and epicardial mapping in a combined approach to reconstruct intramural activation times.
J. K. Holmen, D. Sahasrabudhe, M. Berzins. A Heterogeneous MPI+PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems, In Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC21), ACM, 2021.
Asynchronous many-task runtime systems and MPI+X hybrid parallelism approaches have shown promise for helping manage the increasing complexity of nodes in current and emerging high performance computing (HPC) systems, including those for exascale. The increasing architectural diversity, however, poses challenges for large legacy runtime systems emphasizing broad support for major HPC systems. Performance portability layers (PPL) have shown promise for helping manage this diversity. This paper describes a heterogeneous MPI+PPL task scheduling approach for combining these promising solutions with additional consideration for parallel third party libraries facing similar challenges to help prepare such a runtime for the diverse heterogeneous systems accompanying exascale computing. This approach is demonstrated using a heterogeneous MPI+Kokkos task scheduler and the accompanying portable abstractions  implemented in the Uintah Computational Framework, an asynchronous many-task runtime system, with additional consideration for hypre, a parallel third party library. Results are shown for two challenging problems executing workloads representative of typical Uintah applications. These results show performance improvements up to 4.4x when using this scheduler and the accompanying portable abstractions  to port a previously MPI-Only problem to Kokkos::OpenMP and Kokkos::CUDA to improve multi-socket, multi-device node use. Good strong-scaling to 1,024 NVIDIA V100 GPUs and 512 IBM POWER9 processor are also shown using MPI+Kokkos::OpenMP+Kokkos::CUDA at scale.
J. K. Holmen, D. Sahasrabudhe, M. Berzins, A. Bardakoff, T. J. Blattner, . Keyrouz. Uintah+Hedgehog: Combining Parallelism Models for End-to-End Large-Scale Simulation Performance, Scientific Computing and Imaging Institute, 2021.
The complexity of heterogeneous nodes near and at exascale has increased the need for “heroic” programming efforts. To accommodate this complexity, significant investment is required for codes not yet optimizing for low-level architecture features (e.g., wide vector units) and/or running at large-scale. This paper describes ongoing efforts to combine two codes, Hedgehog and Uintah, lying at both extremes to ease programming efforts. The end goals of this effort are (1) to combine the two codes to make an asynchronous many-task runtime system specializing in both node-level and large-scale performance and (2) to further improve the accessibility of both with portable abstractions. A prototype adopting Hedgehog in Uintah and a prototype extending Hedgehog to support MPI+X hybrid parallelism are discussed. Results achieving ∼60% of NVIDIA V100 GPU peak performance for a distributed DGEMM problem are shown for a naive MPI+Hedgehog implementation before any attempt to optimize for performance.
Authors note: This is a refereed but unpublished report that was
submitted to, reviewed for and accepted in revised form for a presentation of the same material at the Hipar Workshop at Supercomputing 21
Z. Houmani, D. Balouek-Thomert, E. Caron, M. Parashar. Enabling microservices management for Deep Learning applications across the Edge-Cloud Continuum, In SBAC-PAD 2021 - IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, October, 2021.
Deep Learning has shifted the focus of traditional batch workflows to data-driven feature engineering on streaming data. In particular, the execution of Deep Learning workflows presents expectations of near-real-time results with user-defined acceptable accuracy. Meeting the objectives of such applications across heterogeneous resources located at the edge of the network, the core, and in-between requires managing trade-offs between the accuracy and the urgency of the results. However, current data analysis rarely manages the entire Deep Learning pipeline along the data path, making it complex for developers to implement strategies in real-world deployments. Driven by an object detection use case, this paper presents an architecture for time-critical Deep Learning workflows by providing a data-driven scheduling approach to distribute the pipeline across Edge to Cloud resources. Furthermore, it adopts a data management strategy that reduces the resolution of incoming data when potential trade-off optimizations are available. We illustrate the system's viability through a performance evaluation of the object detection use case on the Grid'5000 testbed. We demonstrate that in a multiuser scenario, with a standard frame rate of 25 frames per second, the system speed-up data analysis up to 54.4% compared to a Cloud-only-based scenario with an analysis accuracy higher than a fixed threshold.
X. Huang, P. Klacansky, S. Petruzza, A. Gyulassy, P.T. Bremer, V. Pascucci. Distributed merge forest: a new fast and scalable approach for topological analysis at scale, In Proceedings of the ACM International Conference on Supercomputing, pp. 367-377. 2021.
Topological analysis is used in several domains to identify and characterize important features in scientific data, and is now one of the established classes of techniques of proven practical use in scientific computing. The growth in parallelism and problem size tackled by modern simulations poses a particular challenge for these approaches. Fundamentally, the global encoding of topological features necessitates inter process communication that limits their scaling. In this paper, we extend a new topological paradigm to the case of distributed computing, where the construction of a global merge tree is replaced by a distributed data structure, the merge forest, trading slower individual queries on the structure for faster end-to-end performance and scaling. Empirically, the queries that are most negatively affected also tend to have limited practical use. Our experimental results demonstrate the scalability of both the merge forest construction and the parallel queries needed in scientific workflows, and contrast this scalability with the two established alternatives that construct variations of a global tree.
M. H. Jensen, S. Joshi, S. Sommer. Bridge Simulation and Metric Estimation on Lie Groups, Subtitled arXiv preprint arXiv:2106.03431, 2021.
We present a simulation scheme for simulating Brownian bridges on complete and connected Lie groups. We show how this simulation scheme leads to absolute continuity of the Brownian bridge measure with respect to the guided process measure. This result generalizes the Euclidean result of Delyon and Hu to Lie groups. We present numerical results of the guided process in the Lie group $\SO(3)$. In particular, we apply importance sampling to estimate the metric on $\SO(3)$ using an iterative maximum likelihood method.
M. Højgaard Jensen, L. Hilgendorf, S. Joshi, S. Sommer. Bridge Simulation on Lie Groups and Homogeneous Spaces with Application to Parameter Estimation, Subtitled arXiv:2112.00866, 2021.
Deep Adaptive Electrocardiographic Imaging with Generative Forward Model for Error Reduction, In Functional Imaging and Modeling of the Heart: 11th International Conference, In Functional Imaging and Modeling of the Heart: 11th International Conference, Vol. 12738, Springer Nature, pp. 471. 2021.X. Jiang, J. C. Font, J. A. Bergquist, B. Zenger, W. W. Good, D. H. Brooks, R. S. MacLeod, L. Wang.
Accuracy of estimating the heart’s electrical activity with Electrocardiographic Imaging (ECGI) is challenging due to using an error-prone physics-based model (forward model). While getting better results than the traditional numerical methods following the underlying physics, modern deep learning approaches ignore the physics behind the electrical propagation in the body and do not allow the use of patientspecific geometry. We introduce a deep-learning-based ECGI framework capable of understanding the underlying physics, aware of geometry, and adjustable to patient-specific data. Using a variational autoencoder (VAE), we uncover the forward model’s parameter space, and when solving the inverse problem, these parameters will be optimized to reduce the errors in the forward model. In both simulation and real data experiments, we demonstrated the ability of the presented framework to provide accurate reconstruction of the heart’s electrical potentials and localization of the earliest activation sites.
C. R. Johnson.
Translational computer science at the scientific computing and imaging institute, In Journal of Computational Science, Vol. 52, pp. 101217. 2021.
The Scientific Computing and Imaging (SCI) Institute at the University of Utah evolved from the SCI research group, started in 1994 by Professors Chris Johnson and Rob MacLeod. Over time, research centers funded by the National Institutes of Health, Department of Energy, and State of Utah significantly spurred growth, and SCI became a permanent interdisciplinary research institute in 2000. The SCI Institute is now home to more than 150 faculty, students, and staff. The history of the SCI Institute is underpinned by a culture of multidisciplinary, collaborative research, which led to its emergence as an internationally recognized leader in the development and use of visualization, scientific computing, and image analysis research to solve important problems in a broad range of domains in biomedicine, science, and engineering. A particular hallmark of SCI Institute research is the creation of open source software systems, including the SCIRun scientific problem-solving environment, Seg3D, ImageVis3D, Uintah, ViSUS, Nektar++, VisTrails, FluoRender, and FEBio. At this point, the SCI Institute has made more than 50 software packages broadly available to the scientific community under open-source licensing and supports them through web pages, documentation, and user groups. While the vast majority of academic research software is written and maintained by graduate students, the SCI Institute employs several professional software developers to help create, maintain, and document robust, tested, well-engineered open source software. The story of how and why we worked, and often struggled, to make professional software engineers an integral part of an academic research institute is crucial to the larger story of the SCI Institute’s success in translational computer science (TCS).
Several severity metrics have been developed for metopic craniosynostosis, including a recent machine learning-derived algorithm. This study assessed the diagnostic concordance between machine learning and previously published severity indices.
Area Available for Atrial Fibrillation to Propagate Is an Important Determinant of Recurrence After Ablation, In JACC: Clinical Electrophysiology, Elsevier, 2021.R. Kamali, J. Kump, E. Ghafoori, M. Lange, N. Hu, T. J. Bunch, D. J. Dosdall, R. S. Macleod, R. Ranjan.
This study sought to evaluate atrial fibrillation (AF) ablation outcomes based on scar patterns and contiguous area available for AF wavefronts to propagate.
V. Keshavarzzadeh, M. Alirezaei, T. Tasdizen, R. M. Kirby. Image-Based Multiresolution Topology Optimization Using Deep Disjunctive Normal Shape Model, In Computer-Aided Design, Vol. 130, Elsevier, pp. 102947. 2021.
We present a machine learning framework for predicting the optimized structural topology design susing multiresolution data. Our approach primarily uses optimized designs from inexpensive coarse mesh finite element simulations for model training and generates high resolution images associated with simulation parameters that are not previously used. Our cost-efficient approach enables the designers to effectively search through possible candidate designs in situations where the design requirements rapidly change. The underlying neural network framework is based on a deep disjunctive normal shape model (DDNSM) which learns the mapping between the simulation parameters and segments of multi resolution images. Using this image-based analysis we provide a practical algorithm which enhances the predictability of the learning machine by determining a limited number of important parametric samples(i.e.samples of the simulation parameters)on which the high resolution training data is generated. We demonstrate our approach on benchmark compliance minimization problems including the 3D topology optimization where we show that the high-fidelity designs from the learning machine are close to optimal designs and can be used as effective initial guesses for the large-scale optimization problem.
V. Keshavarzzadeh, R. M. Kirby, A. Narayan. Multilevel Designed Quadrature for Partial Differential Equations with Random Inputs, In SIAM Journal on Scientific Computing, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A1412-A1440. 2021.
We introduce a numerical method, multilevel designed quadrature for computing the statistical solution of partial differential equations with random input data. Similar to multilevel Monte Carlo methods, our method relies on hierarchical spatial approximations in addition to a parametric/stochastic sampling strategy. A key ingredient in multilevel methods is the relationship between the spatial accuracy at each level and the number of stochastic samples required to achieve that accuracy. Our sampling is based on flexible quadrature points that are designed for a prescribed accuracy, which can yield less overall computational cost compared to alternative multilevel methods. We propose a constrained optimization problem that determines the number of samples to balance the approximation error with the computational budget. We further show that the optimization problem is convex and derive analytic formulas for the optimal number of points at each level. We validate the theoretical estimates and the performance of our multilevel method via numerical examples on a linear elasticity and a steady state heat diffusion problem.
Robust topology optimization with low rank approximation using artificial neural networks, In Computational Mechanics, 2021.
We present a low rank approximation approach for topology optimization of parametrized linear elastic structures. The parametrization is considered on loading and stiffness of the structure. The low rank approximation is achieved by identifying a parametric connection among coarse finite element models of the structure (associated with different design iterates) and is used to inform the high fidelity finite element analysis. We build an Artificial Neural Network (ANN) map between low resolution design iterates and their corresponding interpolative coefficients (obtained from low rank approximations) and use this surrogate to perform high resolution parametric topology optimization. We demonstrate our approach on robust topology optimization with compliance constraints/objective functions and develop error bounds for the the parametric compliance computations. We verify these parametric computations with more challenging quantities of interest such as the p-norm of von Mises stress. To conclude, we use our approach on a 3D robust topology optimization and show significant reduction in computational cost via quantitative measures.
V. Keshavarzzadeh, S. Zhe, R.M. Kirby, A. Narayan. GP-HMAT: Scalable, $O(n\log (n)) $ Gaussian Process Regression with Hierarchical Low-Rank Matrices, Subtitled arXiv preprint arXiv:2201.00888, 2021.
A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in operator in MATLAB for large-scale linear solves Ax=y via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research.
R. Kirby, K. Nottingham, R. Roy, S. Godil, B. Catanzaro. Guiding Global Placement With Reinforcement Learning, Subtitled arXiv preprint arXiv:2109.02631, 2021.
Recent advances in GPU accelerated global and detail placement have reduced the time to solution by an order of magnitude. This advancement allows us to leverage data driven optimization (such as Reinforcement Learning) in an effort to improve the final quality of placement results. In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent trained to improve the final detail placed Half Perimeter Wire Length (HPWL). We propose novel control schemes with either global or localized control of the placement process. We then train reinforcement learning agents to use these controls to guide placement to improved solutions. In both cases, the augmented optimizer finds improved placement solutions. Our trained agents achieve an average 1% improvement in final detail place HPWL across a range of academic benchmarks and more than 1% in global place HPWL on real industry designs.
We present a method for the browsing of hierarchical 3D models in which we combine the typical navigation of hierarchical structures in a 2D environment---using clicks on nodes, links, or icons---with a 3D spatial data visualization. Our approach is motivated by large molecular models, for which the traditional single-scale navigational metaphors are not suitable. Multi-scale phenomena, e. g., in astronomy or geography, are complex to navigate due to their large data spaces and multi-level organization. Models from structural biology are in addition also densely crowded in space and scale. Cutaways are needed to show individual model subparts. The camera has to support exploration on the level of a whole virus, as well as on the level of a small molecule. We address these challenges by employing HyperLabels: active labels that---in addition to their annotational role---also support user interaction. Clicks on HyperLabels select the next structure to be explored. Then, we adjust the visualization to showcase the inner composition of the selected subpart and enable further exploration. Finally, we use a breadcrumbs panel for orientation and as a mechanism to traverse upwards in the model hierarchy. We demonstrate our concept of hierarchical 3D model browsing using two exemplary models from meso-scale biology.
A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, M.W. Mahoney. Characterizing possible failure modes in physics-informed neural networks, Subtitled arXiv preprint arXiv:2109.01050, 2021.
Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena even for simple PDEs. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves differential operators, can introduce a number of subtle problems, including making the problem ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.
L. Kühnel, T. Fletcher, S. Joshi, S. Sommer. Latent Space Geometric Statistics, In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part VI, Springer International Publishing, pp. 163-178. 2021.
Deep generative models, e.g., variational autoencoders and generative adversarial networks, result in latent representation of observed data. The low dimensionality of the latent space provides an ideal setting for analysing high-dimensional data that would otherwise often be infeasible to handle statistically. The linear Euclidean geometry of the high-dimensional data space pulls back to a nonlinear Riemannian geometry on latent space where classical linear statistical techniques are no longer applicable. We show how analysis of data in their latent space representation can be performed using techniques from the field of geometric statistics. Geometric statistics provide generalisations of Euclidean statistical notions including means, principal component analysis, and maximum likelihood estimation of parametric distributions. Introduction to estimation procedures on latent space are considered, and the …
D. Lange, E. Polanco, R. Judson-Torres, T. Zangle, A. Lex. Loon: Using Exemplars to Visualize Large Scale Microscopy Data, In OSF Preprints, 2021.
Which drug is most promising for a cancer patient? This is a question a new microscopy-based approach for measuring the mass of individual cancer cells treated with different drugs promises to answer in only a few hours. However, the analysis pipeline for extracting data from these images is still far from complete automation: human intervention is necessary for quality control for preprocessing steps such as segmentation, to adjust filters, and remove noise, and for the analysis of the result. To address this workflow, we developed Loon, a visualization tool for analyzing drug screening data based on quantitative phase microscopy imaging. Loon visualizes both, derived data such as growth rates, and imaging data. Since the images are collected automatically at a large scale, manual inspection of images and segmentations is infeasible. However, reviewing representative samples of cells is essential, both for quality control and for data analysis. We introduce a new approach of choosing and visualizing representative exemplar cells that retain a close connection to the low-level data. By tightly integrating the derived data visualization capabilities with the novel exemplar visualization and providing selection and filtering capabilities, Loon is well suited for making decisions about which drugs are suitable for a specific patient.