J. Adams, S. Elhabian. Fully Bayesian VIB-DeepSSM, Subtitled arXiv:2305.05797, 2023.
Statistical shape modeling (SSM) enables population-based quantitative analysis of anatomical shapes, informing clinical diagnosis. Deep learning approaches predict correspondence-based SSM directly from unsegmented 3D images but require calibrated uncertainty quantification, motivating Bayesian formulations. Variational information bottleneck DeepSSM (VIB-DeepSSM) is an effective, principled framework for predicting probabilistic shapes of anatomy from images with aleatoric uncertainty quantification. However, VIB is only half-Bayesian and lacks epistemic uncertainty inference. We derive a fully Bayesian VIB formulation from both the probably approximately correct (PAC)-Bayes and variational inference perspectives. We demonstrate the efficacy of two scalable approaches for Bayesian VIB with epistemic uncertainty: concrete dropout and batch ensemble. Additionally, we introduce a novel combination of the two that further enhances uncertainty calibration via multimodal marginalization. Experiments on synthetic shapes and left atrium data demonstrate that the fully Bayesian VIB network predicts SSM from images with improved uncertainty reasoning without sacrificing accuracy.
J. Adams, S. Elhabian. Can point cloud networks learn statistical shape models of anatomies?, Subtitled arXiv:2305.05610, 2023.
Statistical Shape Modeling (SSM) is a valuable tool for investigating and quantifying anatomical variations within populations of anatomies. However, traditional correspondence-based SSM generation methods require a time-consuming re-optimization process each time a new subject is added to the cohort, making the inference process prohibitive for clinical research. Additionally, they require complete geometric proxies (e.g., high-resolution binary volumes or surface meshes) as input shapes to construct the SSM. Unordered 3D point cloud representations of shapes are more easily acquired from various medical imaging practices (e.g., thresholded images and surface scanning). Point cloud deep networks have recently achieved remarkable success in learning permutation-invariant features for different point cloud tasks (e.g., completion, semantic segmentation, classification). However, their application to learning SSM from point clouds is to-date unexplored. In this work, we demonstrate that existing point cloud encoder-decoder-based completion networks can provide an untapped potential for SSM, capturing population-level statistical representations of shapes while reducing the inference burden and relaxing the input requirement. We discuss the limitations of these techniques to the SSM application and suggest future improvements. Our work paves the way for further exploration of point cloud deep learning for SSM, a promising avenue for advancing shape analysis literature and broadening SSM to diverse use cases.
J. Adams, S. Elhabian. Point2SSM: Learning Morphological Variations of Anatomies from Point Cloud, Subtitled arXiv:2305.14486, 2023.
We introduce Point2SSM, a novel unsupervised learning approach that can accurately construct correspondence-based statistical shape models (SSMs) of anatomy directly from point clouds. SSMs are crucial in clinical research for analyzing the population-level morphological variation in bones and organs. However, traditional methods for creating SSMs have limitations that hinder their widespread adoption, such as the need for noise-free surface meshes or binary volumes, reliance on assumptions or predefined templates, and simultaneous optimization of the entire cohort leading to lengthy inference times given new data. Point2SSM overcomes these barriers by providing a data-driven solution that infers SSMs directly from raw point clouds, reducing inference burdens and increasing applicability as point clouds are more easily acquired. Deep learning on 3D point clouds has seen recent success in unsupervised representation learning, point-to-point matching, and shape correspondence; however, their application to constructing SSMs of anatomies is largely unexplored. In this work, we benchmark state-of-the-art point cloud deep networks on the task of SSM and demonstrate that they are not robust to the challenges of anatomical SSM, such as noisy, sparse, or incomplete input and significantly limited training data. Point2SSM addresses these challenges via an attention-based module that provides correspondence mappings from learned point features. We demonstrate that the proposed method significantly outperforms existing networks in terms of both accurate surface sampling and correspondence, better capturing population-level statistics.
D. Akbaba, D. Lange, M. Correll, A. Lex, M. Meyer. Troubling Collaboration: Matters of Care for Visualization Design Study, In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23),, pp. 23--28. April, 2023.
A common research process in visualization is for visualization researchers to collaborate with domain experts to solve particular applied data problems. While there is existing guidance and expertise around how to structure collaborations to strengthen research contributions, there is comparatively little guidance on how to navigate the implications of, and power produced through the socio-technical entanglements of collaborations. In this paper, we qualitatively analyze refective interviews of past participants of collaborations from multiple perspectives: visualization graduate students, visualization professors, and domain collaborators. We juxtapose the perspectives of these individuals, revealing tensions about the tools that are built and the relationships that are formed — a complex web of competing motivations. Through the lens of matters of care, we interpret this web, concluding with considerations that both trouble and necessitate reformation of current patterns around collaborative work in visualization design studies to promote more equitable, useful, and care-ful outcomes.
J. Baker, E. Cherkaev, A. Narayan, B. Wang. Learning Proper Orthogonal Decomposition of Complex Dynamics Using Heavy-ball Neural ODEs, In Journal of Scientific Computing, Vol. 95, No. 14, 2023.
Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. This paper extends the recently proposed heavy-ball neural ODEs (HBNODEs) (Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots constructed by solving full-order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-range dependencies effectively from sequential observations, which is crucial for learning intrinsic patterns from sequential data, and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von Kármán Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.
J.W. Beiriger, W. Tao, M.K. Bruce, E. Anstadt, C. Christiensen, J. Smetona, R. Whitaker, J. Goldstein. CranioRate TM: An Image-Based, Deep-Phenotyping Analysis Toolset and Online Clinician Interface for Metopic Craniosynostosis, In Plastic and Reconstructive Surgery, 2023.
The diagnosis and management of metopic craniosynostosis involves subjective decision-making at the point of care. The purpose of this work is to describe a quantitative severity metric and point-of-care user interface to aid clinicians in the management of metopic craniosynostosis and to provide a platform for future research through deep phenotyping.
Two machine-learning algorithms were developed that quantify the severity of craniosynostosis – a supervised model specific to metopic craniosynostosis (Metopic Severity Score) and an unsupervised model used for cranial morphology in general (Cranial Morphology Deviation). CT imaging from multiple institutions were compiled to establish the spectrum of severity and a point-of-care tool was developed and validated.
Over the study period (2019-2021), 254 patients with metopic craniosynostosis and 92 control patients who underwent CT scan between the ages of 6 and 18 months were included. Scans were processed using an unsupervised machine-learning based dysmorphology quantification tool, CranioRate TM. The average Metopic severity score (MSS) for normal controls was 0.0±1.0 and for metopic synostosis was 4.9±2.3 (p<0.001). The average Cranial Morphology Deviation (CMD) for normal controls was 85.2±19.2 and for metopic synostosis was 189.9±43.4 (p<0.001). A point-of-care user interface (craniorate.org) has processed 46 CT images from 10 institutions.
T.C. Bidone, D.J. Odde. Multiscale models of integrins and cellular adhesions, In Current Opinion in Structural Biology, Vol. 80, Elsevier, 2023.
Computational models of integrin-based adhesion complexes have revealed important insights into the mechanisms by which cells establish connections with their external environment. However, how changes in conformation and function of individual adhesion proteins regulate the dynamics of whole adhesion complexes remains largely elusive. This is because of the large separation in time and length scales between the dynamics of individual adhesion proteins (nanoseconds and nanometers) and the emergent dynamics of the whole adhesion complex (seconds and micrometers), and the limitations of molecular simulation approaches in extracting accurate free energies, conformational transitions, reaction mechanisms, and kinetic rates, that can inform mechanisms at the larger scales. In this review, we discuss models of integrin-based adhesion complexes and highlight their main findings regarding: (i) the conformational transitions of integrins at the molecular and macromolecular scales and (ii) the molecular clutch mechanism at the mesoscale. Lastly, we present unanswered questions in the field of modeling adhesions and propose new ideas for future exciting modeling opportunities.
S. Brink, M. McKinsey, D. Boehme, C. Scully-Allison, I. Lumsden, D. Hawkins, T. Burgess, V. Lama, J. Luettgau, K.E. Isaacs, M. Taufer, O. Pearce. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees, In HPDC ’23, ACM, 2023.
Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data.
Thicket has a modular structure composed of three components. The first component is a data structure for multi-dimensional performance data, which is composed automatically on the portable basis of call trees, and accommodates any subset of dimensions present in the dataset. The second is the metadata, enabling distinction and sub-selection of dimensions in performance data. The third is a dimensionality reduction mechanism, enabling analysis such as computing aggregated statistics on a given data dimension. Extensible mechanisms are available for applying analyses (e.g., top-down on Intel CPUs), data science techniques (e.g., K-means clustering from scikit-learn), modeling performance (e.g., Extra-P), and interactive visualization. We demonstrate the power and flexibility of Thicket through two case studies, first with the open-source RAJA Performance Suite on CPU and GPU clusters and another with a arge physics simulation run on both a traditional HPC cluster and an AWS Parallel Cluster instance.
H. Dai, M. Penwarden, R.M. Kirby, S. Joshi. Neural Operator Learning for Ultrasound Tomography Inversion, Subtitled arXiv:2304.03297v1, 2023.
Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.
R. Han, A. Narayan, Y. Xu. An approximate control variates approach to multifidelity distribution estimation, Subtitled arXiv:2303.06422v1, 2023.
Forward simulation-based uncertainty quantification that studies the output distribution of quantities of interest (QoI) is a crucial component for computationally robust statistics and engineering. There is a large body of literature devoted to accurately assessing statistics of QoI, and in particular, multilevel or multifidelity approaches are known to be effective, leveraging cost-accuracy tradeoffs between a given ensemble of models. However, effective algorithms that can estimate the full distribution of outputs are still under active development. In this paper, we introduce a general multifidelity framework for estimating the cumulative distribution functions (CDFs) of vector-valued QoI associated with a high-fidelity model under a budget constraint. Given a family of appropriate control variates obtained from lower fidelity surrogates, our framework involves identifying the most cost-effective model subset and then using it to build an approximate control variates estimator for the target CDF. We instantiate the framework by constructing a family of control variates using intermediate linear approximators and rigorously analyze the corresponding algorithm. Our analysis reveals that the resulting CDF estimator is uniformly consistent and budget-asymptotically optimal, with only mild moment and regularity assumptions. The approach provides a robust multifidelity CDF estimator that is adaptive to the available budget, does not require a priori knowledge of cross-model statistics or model hierarchy, and is applicable to general output dimensions. We demonstrate the efficiency and robustness of the approach using several test examples.
K. G. Hicks, A. A. Cluntun, H. L. Schubert, S. R. Hackett, J. A. Berg, P. G. Leonard, M. A. Ajalla Aleixo, Y. Zhou, A. J. Bott, S. R. Salvatore, F. Chang, A. Blevins, P. Barta, S. Tilley, A. Leifer, A. Guzman, A. Arok, S. Fogarty, J. M. Winter, H. Ahn, K. N. Allen, S. Block, I. A. Cardoso, J. Ding, I. Dreveny, C. Gasper, Q. Ho, A. Matsuura, M. J. Palladino, S. Prajapati, P. Sun, K. Tittmann, D. R. Tolan, J. Unterlass, A. P. VanDemark, M. G. Vander Heiden, B. A. Webb, C. Yun, P. Zhap, B. Wang, F. J. Schopfer, C. P. Hill, M. C. Nonato, F. L. Muller, J. E. Cox, J. Rutter. Protein-metabolite interactomics of carbohydrate metabolism reveal regulation of lactate dehydrogenase, In Science, Vol. 379, No. 6636, pp. 996-1003. 2023.
Metabolic networks are interconnected and influence diverse cellular processes. The protein-metabolite interactions that mediate these networks are frequently low affinity and challenging to systematically discover. We developed mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically (MIDAS) to identify such interactions. Analysis of 33 enzymes from human carbohydrate metabolism identified 830 protein-metabolite interactions, including known regulators, substrates, and products as well as previously unreported interactions. We functionally validated a subset of interactions, including the isoform-specific inhibition of lactate dehydrogenase by long-chain acyl–coenzyme A. Cell treatment with fatty acids caused a loss of pyruvate-lactate interconversion dependent on lactate dehydrogenase isoform expression. These protein-metabolite interactions may contribute to the dynamic, tissue-specific metabolic flexibility that enables growth and survival in an ever-changing nutrient environment. Understanding how metabolic state influences cellular processes requires systematic analysis of low-affinity interactions of metabolites with proteins. Hicks et al. describe a method called MIDAS (mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically), which allowed them to probe such interactions for 33 enzymes of human carbohydrate metabolism and more than 400 metabolites. The authors detected many known and many new interactions, including regulation of lactate dehydrogenase by ATP and long-chain acyl coenzyme A, which may help to explain known physiological relations between fat and carbohydrate metabolism in different tissues. —LBR A mass spectrometry and dialysis method detects metabolite-protein interactions that help to control physiology.
D. Hoang, H. Bhatia, P. Lindstrom, V. Pascucci. Progressive Tree-Based Compression of Large-Scale Particle Data, In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--18. 2023.
Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.
M. Hu, K. Zhang, Q. Nguyen, T. Tasdizen. The effects of passive design on indoor thermal comfort and energy savings for residential buildings in hot climates: A systematic review, In Urban Climate, Vol. 49, pp. 101466. 2023.
In this study, a systematic review and meta-analysis were conducted to identify, categorize, and investigate the effectiveness of passive cooling strategies (PCSs) for residential buildings. Forty-two studies published between 2000 and 2021 were reviewed; they examined the effects of PCSs on indoor temperature decrease, cooling load reduction, energy savings, and thermal comfort hour extension. In total, 30 passive strategies were identified and classified into three categories: design approach, building envelope, and passive cooling system. The review found that using various passive strategies can achieve, on average, (i) an indoor temperature decrease of 2.2 °C, (ii) a cooling load reduction of 31%, (iii) energy savings of 29%, and (v) a thermal comfort hour extension of 23%. Moreover, the five most effective passive strategies were identified as well as the differences between hot and dry climates and hot and humid climates.
K. Iyer, S. Elhabian. Mesh2SSM: From Surface Meshes to Statistical Shape Models of Anatomy, Subtitled arXiv:2305.07805, 2023.
Statistical shape modeling is the computational process of discovering significant shape parameters from segmented anatomies captured by medical images (such as MRI and CT scans), which can fully describe subject-specific anatomy in the context of a population. The presence of substantial non-linear variability in human anatomy often makes the traditional shape modeling process challenging. Deep learning techniques can learn complex non-linear representations of shapes and generate statistical shape models that are more faithful to the underlying population-level variability. However, existing deep learning models still have limitations and require established/optimized shape models for training. We propose Mesh2SSM, a new approach that leverages unsupervised, permutation-invariant representation learning to estimate how to deform a template point cloud to subject-specific meshes, forming a correspondence-based shape model. Mesh2SSM can also learn a population-specific template, reducing any bias due to template selection. The proposed method operates directly on meshes and is computationally efficient, making it an attractive alternative to traditional and deep learning-based SSM approaches.
R. Kamali, E. Kwan, M. Regouski, T.J. Bunch, D.J. Dosdall, E. Hsu, R. S. Macleod, I. Polejaeva, R. Ranjan. Contribution of atrial myofiber architecture to atrial fibrillation, In PLOS ONE, Vol. 18, No. 1, Public Library of Science, pp. 1--16. Jan, 2023.
The role of fiber orientation on a global chamber level in sustaining atrial fibrillation (AF) is unknown. The goal of this study was to correlate the fiber direction derived from Diffusion Tensor Imaging (DTI) with AF inducibility.
T. Kataria, S. Rajamani, A.B. Ayubi, M. Bronner, J. Jedrzkiewicz, B. Knudsen, S. Elhabian. Automating Ground Truth Annotations For Gland Segmentation Through Immunohistochemistry, 2023.
The microscopic evaluation of glands in the colon is of utmost importance in the diagnosis of inflammatory bowel disease (IBD) and cancer. When properly trained, deep learning pipelines can provide a systematic, reproducible, and quantitative assessment of disease-related changes in glandular tissue architecture. The training and testing of deep learning models require large amounts of manual annotations, which are difficult, time-consuming, and expensive to obtain. Here, we propose a method for the automated generation of ground truth in digital H&E slides using immunohistochemistry (IHC) labels. The image processing pipeline generates annotations of glands in H&E histopathology images from colon biopsies by transfer of gland masks from CK8/18, CDX2, or EpCAM IHC. The IHC gland outlines are transferred to co-registered H&E images for the training of deep learning models. We compare the performance of the deep learning models to manual annotations using an internal held-out set of biopsies as well as two public data sets. Our results show that EpCAM IHC provides gland outlines that closely match manual gland annotations (DICE = 0.89) and are robust to damage by inflammation. In addition, we propose a simple data sampling technique that allows models trained on data from several sources to be adapted to a new data source using just a few newly annotated samples. The best-performing models achieved average DICE scores of 0.902 and 0.89, respectively, on GLAS and CRAG colon cancer public datasets when trained with only 10% of annotated cases from either public cohort. Altogether, the performances of our models indicate that automated annotations using cell type-specific IHC markers can safely replace manual annotations. The automated IHC labels from single institution cohorts can be combined with small numbers of hand-annotated cases from multi-institutional cohorts to train models that generalize well to diverse data sources.
T. Kataria, B. Knudsen, S. Elhabian. Unsupervised Domain Adaptation for Semantic Segmentation via Feature-space Density Matching, Subtitled arXiv:2305.05789, 2023.
Semantic segmentation is a critical step in automated image interpretation and analysis where pixels are classified into one or more predefined semantically meaningful classes. Deep learning approaches for semantic segmentation rely on harnessing the power of annotated images to learn features indicative of these semantic classes. Nonetheless, they often fail to generalize when there is a significant domain (i.e., distributional) shift between the training (i.e., source) data and the dataset(s) encountered when deployed (i.e., target), necessitating manual annotations for the target data to achieve acceptable performance. This is especially important in medical imaging because different image modalities have significant intra- and inter-site variations due to protocol and vendor variability. Current techniques are sensitive to hyperparameter tuning and target dataset size. This paper presents an unsupervised domain adaptation approach for semantic segmentation that alleviates the need for annotating target data. Using kernel density estimation, we match the target data distribution to the source data in the feature space. We demonstrate that our results are comparable or superior on multiple-site prostate MRI and histopathology images, which mitigates the need for annotating target data.
S. Leventhal, A. Gyulassy, M. Heimann, V. Pascucci. Exploring Classification of Topological Priors with Machine Learning for Feature Extraction, In IEEE Transactions on Visualization and Computer Graphics, pp. 1--12. 2023.
In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this paper, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.
H. Lin, M. Lisnic, D. Akbaba, M. Meyer, A. Lex. Here’s what you need to know about my data: Exploring Expert Knowledge’s Role in Data Analysis, 2023.
Data driven decision making has become the gold standard in science, industry, and public policy. Yet data alone, as an imperfect and partial representation of reality, is often insufficient to make good analysis decisions. Knowledge about the context of a dataset, its strengths and weaknesses, and its applicability for certain tasks is essential. In this work, we present an interview study with analysts from a wide range of domains and with varied expertise and experience inquiring about the role of contextual knowledge. We provide insights into how data is insufficient in analysts workflows and how they incorporate other sources of knowledge into their analysis. We also suggest design opportunities to better and more robustly consider both, knowledge and data in analysis processes.
H. Oh, R. Amici, G. Bomarito, S. Zhe, R. Kirby, J. Hochhalter. Genetic Programming Based Symbolic Regression for Analytical Solutions to Differential Equations, Subtitled arXiv:2302.03175v1, 2023.
In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is verified by assessing its ability to recover known analytic solutions for two separate differential equations. The developed method is compared to a conventional, purely data-driven genetic programming based symbolic regression algorithm. The reliability of successful evolution of the true solution, or an algebraic equivalent, is demonstrated.