Publications

Page 1 of 13

SCI Publications

2025

X. Huang, W. Usher, V. Pascucci. “Approximate Puzzlepiece Compositing,” Subtitled “arXiv:2501.12581,” 2025.

ABSTRACT

The increasing demand for larger and higher fidelity simulations has made Adaptive Mesh Refinement (AMR) and unstructured mesh techniques essential to focus compute effort and memory cost on just the areas of interest in the simulation domain. The distribution of these meshes over the compute nodes is often determined by balancing compute, memory, and network costs, leading to distributions with jagged nonconvex boundaries that fit together much like puzzle pieces. It is expensive, and sometimes impossible, to re-partition the data posing a challenge for in situ and post hoc visualization as the data cannot be rendered using standard sort-last compositing techniques that require a convex and disjoint data partitioning. We present a new distributed volume rendering and compositing algorithm, Approximate Puzzlepiece Compositing, that enables fast and high-accuracy in-place rendering of AMR and unstructured meshes. Our approach builds on Moment-Based Ordered-Independent Transparency to achieve a scalable, order-independent compositing algorithm that requires little communication and does not impose requirements on the data partitioning. We evaluate the image quality and scalability of our approach on synthetic data and two large-scale unstructured meshes on HPC systems by comparing to state-of-the-art sort-last compositing techniques, highlighting our approach’s minimal overhead at higher core counts. We demonstrate that Approximate Puzzlepiece Compositing provides a scalable, high-performance, and high-quality distributed rendering approach applicable to the complex data distributions encountered in large-scale CFD simulations.

2024

X. Huang, H. Miao, A. Townsend, K. Champley, J. Tringe, V. Pascucci, P.T. Bremer. “Bimodal Visualization of Industrial X-Ray and Neutron Computed Tomography Data,” In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
DOI: 10.1109/TVCG.2024.3382607

ABSTRACT

Advanced manufacturing creates increasingly complex objects with material compositions that are often difficult to characterize by a single modality. Our collaborating domain scientists are going beyond traditional methods by employing both X-ray and neutron computed tomography to obtain complementary representations expected to better resolve material boundaries. However, the use of two modalities creates its own challenges for visualization, requiring either complex adjustments of bimodal transfer functions or the need for multiple views. Together with experts in nondestructive evaluation, we designed a novel interactive bimodal visualization approach to create a combined view of the co-registered X-ray and neutron acquisitions of industrial objects. Using an automatic topological segmentation of the bivariate histogram of X-ray and neutron values as a starting point, the system provides a simple yet effective interface to easily create, explore, and adjust a bimodal visualization. We propose a widget with simple brushing interactions that enables the user to quickly correct the segmented histogram results. Our semiautomated system enables domain experts to intuitively explore large bimodal datasets without the need for either advanced segmentation algorithms or knowledge of visualization techniques. We demonstrate our approach using synthetic examples, industrial phantom objects created to stress bimodal scanning techniques, and real-world objects, and we discuss expert feedback.

Z. Li, H. Miao, V. Pascucci, S. Liu. “Visualization Literacy of Multimodal Large Language Models: A Comparative Study,” Subtitled “arXiv:2407.10996,” 2024.

ABSTRACT

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective.

In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.

Z. Li, S. Liu, X. Yu, K. Bhavya, J. Cao, J.D. Diffenderfer, D. James, P.T. Bremer, V. Pascucci. ““Understanding Robustness Lottery”: A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches,” In IEEE Transactions on Visualization & Computer Graphics, IEEE, pp. 1--16. 2024.
ISSN: 1941-0506
DOI: 10.1109/TVCG.2024.3514996

ABSTRACT

Deep learning approaches have provided state-of-the-art performance in many applications by relying on large and overparameterized neural networks. However, such networks are very brittle and are difficult to deploy on resource-limited platforms. Model pruning, i.e., reducing the size of the network, is a widely adopted strategy that can lead to a more robust and compact model. Many heuristics exist for model pruning, but our understanding of the pruning process remains limited due to the black-box nature of a neural network model. Empirical studies show that some heuristics improve performance whereas others can make models more brittle. This work aims to shed light on how different pruning methods alter the network's internal feature representation and the corresponding impact on model performance. To facilitate a comprehensive comparison and characterization of the high-dimensional model feature space, we introduce a visual geometric analysis of feature representations. We evaluated a set of critical geometric concepts decomposed from the commonly adopted classification loss and used them to design a visualization system to compare and highlight the impact of pruning on model performance and feature representation. The proposed tool provides an environment for an in-depth comparison of pruning methods and a comprehensive understanding of how the model responds to common data corruption. By leveraging the proposed visualization, machine learning researchers can reveal the similarities between pruning methods and redundancy in robustness evaluation benchmarks, obtain geometric insights about the differences between pruned models that achieve superior robustness performance, and identify samples that are robust or fragile to model pruning and common data corruption.

A. Panta, G. Scorzelli, A. Gooch, V. Pascucci, H. Lee. “Managing Large-scale Atmospheric and Oceanic Climate Data for Efficient Analysis and On-the-fly Interactive Visualization,” 2024.
DOI: 10.22541/essoar.173238742.20533901/v1

ABSTRACT

Managing vast volumes of climate data, often reaching into terabytes and petabytes, presents significant challenges in terms of storage, accessibility, efficient analysis, and on-the-fly interactive visualization. Traditional data handling techniques are increasingly inadequate for the massive atmospheric and oceanic data generated by modern climate research. We tackled these challenges by reorganizing the native data layout to optimize access and processing, implementing advanced visualization algorithms like OpenVisus for real-time interactive exploration, and extracting comprehensive metadata for all available fields to improve data discoverability and usability. Our work utilized extensive datasets, including downscaled projections of various climate variables and high-resolution ocean simulations from NEX GDDP CMIP6 and NASA DYAMOND datasets. By transforming the data into progressive, streaming-capable formats and incorporating ARCO (Analysis Ready, Cloud Optimized) features before moving them to the cloud, we ensured that the data is highly accessible and efficient for analysis, while allowing direct access to data subsets in the cloud. The direct integration of the Python library called Xarray allows efficient and easy access to the data, leveraging the familiarity most climate scientists have with it. This approach, combined with the progressive streaming format, not only enhances the findability, shareability and reusability of the data but also facilitates sophisticated analyses and visualizations from commodity hardware like personal cell phones and computers without the need for large computational resources. By collaborating with climate scientists and domain experts from NASA Jet Propulsion Lab and NASA Ames Research Center, we published more than 2 petabytes of climate data via our interactive dashboards for climate scientists and the general public. Ultimately, our solution fosters quicker decision-making, greater collaboration, and innovation in the global climate science community by breaking down barriers imposed by hardware limitations and geographical constraints and allowing access to sophisticated visualization tools via publicly available dashboards.

M. Taufer, H. Martinez, J. Luettgau, L. Whitnah, G. Scorzelli, P. Newel, A. Panta, T. Bremer, D. Fils, C.R. Kirkpatrick, N. McCurdy, V. Pascucci. “Integrating FAIR Digital Objects (FDOs) into the National Science Data Fabric (NSDF) to Revolutionize Dataflows for Scientific Discovery,” In Computing in Science & Engineering, IEEE, 2024.

ABSTRACT

In this perspective paper, we introduce a paradigm-shifting approach that combines the power of FAIR Digital Objects (FDO) with the National Science Data Fabric (NSDF), defining a new era of data accessibility, scientific discovery, and education. Integrating FDOs into the NSDF opens doors to overcoming substantial data access barriers and facilitating the extraction of machine-actionable metadata aligned with FAIR principles. Our augmented NSDF empowers the exchange of massive climate simulations and streamlines materials science workflows. This paper lays the foundation for an inclusive, web-centric, and network-first design, democratizing data access and fostering unprecedented opportunities for research and collaboration within the scientific community.

M. Taufer, J. Marquez, H. Martinez, A. Gooch, A. Panta, G. Scorzelli, P. Olaya, V. Pascucci. “Leveraging National Science Data Fabric Services to Train Data Scientists,” Subtitled “Workshop on Education for High Performance Computing,” In The International Conference for High Performance Computing, Networking, Storage, and Analysis, IEEE, 2024.

ABSTRACT

We document an interactive half-day tutorial in which participants explore the advanced applications of National Science Data Fabric (NSDF) services and strategies for comprehensive scientific data analysis. Targeting researchers, students, developers, and scientists, the tutorial provides valuable insights into managing and analyzing large datasets, particularly those exceeding 100TB. Participants gain hands-on experience by constructing modular workflows, leveraging public and private data storage and streaming solutions, and deploying sophisticated visualization and analysis dashboards. The tutorial emphasizes NSDF’s role in supporting visualization conference themes by providing scalable visualization and visual analytics solutions. Our tutorial includes an overview of NSDF’s capabilities, addressing common data analysis challenges, and intermediate hands-on exercises using NSDF services for Earth science data. Advanced applications cover handling and visualizing massive datasets requiring high-resolution data management. By the end of the session, attendees have a deeper understanding of integrating NSDF services into their research workflows, enhancing data accessibility, sharing, and collaborative scientific discovery. Our tutorial aims to advance knowledge in data-intensive computing and empower participants to harness the full potential of NSDF in their respective fields.

2023

D. Hoang, H. Bhatia, P. Lindstrom, V. Pascucci. “Progressive Tree-Based Compression of Large-Scale Particle Data,” In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--18. 2023.
DOI: 10.1109/TVCG.2023.3260628

ABSTRACT

Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.

S. Leventhal, A. Gyulassy, M. Heimann, V. Pascucci. “Exploring Classification of Topological Priors with Machine Learning for Feature Extraction,” In IEEE Transactions on Visualization and Computer Graphics, pp. 1--12. 2023.

ABSTRACT

In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this paper, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.

Z. Li, S. Liu, K. Bhavya, T. Bremer, V. Pascucci. “Instance-wise Linearization of Neural Network for Model Interpretation,” Subtitled “arXiv:2310.16295v1,” 2023.

ABSTRACT

Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction.

For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication F(x)=W⋅x+b. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.

AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making. “S. Liu, H. Miao, Z. Li, M. Olson, V. Pascucci, P.T. Bremer,” Subtitled “arXiv preprint arXiv:2312.04494,” 2023.

ABSTRACT

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.

J. Luettgau, G. Scorzelli, V. Pascucci, M. Taufer. “Development of Large-Scale Scientific Cyberinfrastructure and the Growing Opportunity to Democratize Access to Platforms and Data,” In Distributed, Ambient and Pervasive Interactions, Springer Nature Switzerland, pp. 378--389. 2023.
ISBN: 978-3-031-34668-2
DOI: 10.1007/978-3-031-34668-2_25

ABSTRACT

As researchers across scientific domains rapidly adopt advanced scientific computing methodologies, access to advanced cyberinfrastructure (CI) becomes a critical requirement in scientific discovery. Lowering the entry barriers to CI is a crucial challenge in interdisciplinary sciences requiring frictionless software integration, data sharing from many distributed sites, and access to heterogeneous computing platforms. In this paper, we explore how the challenge is not merely a factor of availability and affordability of computing, network, and storage technologies but rather the result of insufficient interfaces with an increasingly heterogeneous mix of computing technologies and data sources. With more distributed computation and data, scientists, educators, and students must invest their time and effort in coordinating data access and movements, often penalizing their scientific research. Investments in the interfaces’ software stack are necessary to help scientists, educators, and students across domains take advantage of advanced computational methods. To this end, we propose developing a science data fabric as the standard scientific discovery interface that seamlessly manages data dependencies within scientific workflows and CI.

J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer. “Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric,” In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023.
DOI: 10.1145/3588195.3595948

ABSTRACT

The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing.

N. Morrical, S. Zellmann, A. Sahistan, P. Shriwise, V. Pascucci. “Attribute-Aware RBFs: Interactive Visualization of Time Series Particle Volumes Using RT Core Range Queries,” In IEEE Trans Vis Comput Graph, IEEE, 2023.
DOI: 10.1109/TVCG.2023.3327366

ABSTRACT

Supplemental material

Smoothed-particle hydrodynamics (SPH) is a mesh-free method used to simulate volumetric media in fluids, astrophysics, and solid mechanics. Visualizing these simulations is problematic because these datasets often contain millions, if not billions of particles carrying physical attributes and moving over time. Radial basis functions (RBFs) are used to model particles, and overlapping particles are interpolated to reconstruct a high-quality volumetric field; however, this interpolation process is expensive and makes interactive visualization difficult. Existing RBF interpolation schemes do not account for color-mapped attributes and are instead constrained to visualizing just the density field. To address these challenges, we exploit ray tracing cores in modern GPU architectures to accelerate scalar field reconstruction. We use a novel RBF interpolation scheme to integrate per-particle colors and densities, and leverage GPU-parallel tree construction and refitting to quickly update the tree as the simulation animates over time or when the user manipulates particle radii. We also propose a Hilbert reordering scheme to cluster particles together at the leaves of the tree to reduce tree memory consumption. Finally, we reduce the noise of volumetric shadows by adopting a spatially temporal blue noise sampling scheme. Our method can provide a more detailed and interactive view of these large, volumetric, time-series particle datasets than traditional methods, leading to new insights into these physics simulations.

S. Petruzza, B. Summa, A. Gooch, C.M. Laney, T. Goulden, J. Schreiner, S. Callahan, V. Pascucci. “Interactive Visualization and Portable Image Blending of Massive Aerial Image Mosaics,” In IEEE International Conference on Big Data, IEEE, pp. 3365-3370. 2023.

ABSTRACT

Processing, managing and publishing the substantial volume of data collected through modern remote sensing technologies in a format that is easy for researchers - across broad skill levels and scientific domains - to view and use presents a formidable challenge. As a prime example, the massive scale of image mosaics produced by NEON’s Airborne Observation Platform (AOP), often several to hundreds of gigabytes in volume, demands efficient data management strategies. Additionally, these aerial mosaics frequently exhibit seams due to variations in lighting conditions during the data acquisition process. These seams undermine the integrity of subsequent scientific analyses, introducing distortions that hinder accurate interpretation of ecological patterns. Finally, one of NEON’s core objectives is to make these data broadly accessible to users, including those who are not yet versed in working with remote sensing data or who wish to view the datasets without needing to download and process them.In response to these challenges, we have developed a comprehensive data management pipeline that enables interactive access for analysis and visualization of NEON’s aerial mosaic collection. This pipeline automates data ingestion, conversion, and publication in a streamable format, facilitating seamless user interaction through web viewers and programming APIs. Moreover, we have implemented a portable blending algorithm aimed at eliminating these problematic seams from large aerial mosaics. This algorithm, grounded in the Conjugate Gradient (CG) method, has been implemented both in CUDA and using the modern SYCL programming model for enhanced portability across diverse computing platforms.Experimental results demonstrate scalable performance across both CPU and GPU architectures. This work not only addresses the challenges of large aerial data management and seam removal but also opens avenues for more accurate and comprehensive scientific investigations within the NEON ecosystem.

N. Zhou, G. Scorzelli, J. Luettgau, R.R. Kancharla, J. Kane, R. Wheeler, B. Croom, B. Newell, V. Pascucci, M. Taufer. “Orchestration of materials science workflows for heterogeneous resources at large scale,” In The International Journal of High Performance Computing Applications, Sage, 2023.

ABSTRACT

In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.

2022

T. M. Athawale, D. Maljovec. L. Yan, C. R. Johnson, V. Pascucci, B. Wang. “Uncertainty Visualization of 2D Morse Complex Ensembles Using Statistical Summary Maps,” In IEEE Transactions on Visualization and Computer Graphics, Vol. 28, No. 4, pp. 1955-1966. April, 2022.
ISSN: 1077-2626
DOI: 10.1109/TVCG.2020.3022359

ABSTRACT

Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Data uncertainty inherent to scalar fields due to randomness in their acquisition and processing, however, limits our understanding of Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arises from scalar fields coupled with data uncertainty. We propose several statistical summary maps as new entities for quantifying structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce three types of statistical summary maps – the probabilistic map , the significance map , and the survival map – to characterize the uncertain behaviors of gradient flows. We demonstrate the utility of our proposed approach using wind, flow, and ocean eddy simulation datasets.

PT. Bremer, G. Tourassi, W. Bethel, K. Gaither, V. Pascucci, W. Xu. “Report for the ASCR Workshop on Visualization for Scientific Discovery, Decision-Making, and Communication,” DOE, 2022.

Z. Li, S. Liu, X. Yu, K. Bhavya, J. Cao, J. Diffenderfer, P.T. Bremer, V. Pascucci. ““Understanding Robustness Lottery”: A Comparative Visual Analysis of Neural Network Pruning Approaches,” Subtitled “arXiv preprint arXiv:2206.07918,” 2022.

ABSTRACT

Deep learning approaches have provided state-of-the-art performance in many applications by relying on extremely large and heavily overparameterized neural networks. However, such networks have been shown to be very brittle, not generalize well to new uses cases, and are often difficult if not impossible to deploy on resources limited platforms. Model pruning, i.e., reducing the size of the network, is a widely adopted strategy that can lead to more robust and generalizable network -- usually orders of magnitude smaller with the same or even improved performance. While there exist many heuristics for model pruning, our understanding of the pruning process remains limited. Empirical studies show that some heuristics improve performance while others can make models more brittle or have other side effects. This work aims to shed light on how different pruning methods alter the network's internal feature representation, and the corresponding impact on model performance. To provide a meaningful comparison and characterization of model feature space, we use three geometric metrics that are decomposed from the common adopted classification loss. With these metrics, we design a visualization system to highlight the impact of pruning on model prediction as well as the latent feature embedding. The proposed tool provides an environment for exploring and studying differences among pruning methods and between pruned and original model. By leveraging our visualization, the ML researchers can not only identify samples that are fragile to model pruning and data corruption but also obtain insights and explanations on how some pruned …

Z. Li, H. Menon, K. Mohror, S. Liu, L. Guo, P.T. Bremer, V. Pascucci. “A Visual Comparison of Silent Error Propagation,” In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2022.
DOI: 10.1109/TVCG.2022.3230636

ABSTRACT

High-performance computing (HPC) systems play a critical role in facilitating scientific discoveries. Their scale and complexity (e.g., the number of computational units and software stack) continue to grow as new systems are expected to process increasingly more data and reduce computing time. However, with more processing elements, the probability that these systems will experience a random bit-flip error that corrupts a program's output also increases, which is often recognized as silent data corruption. Analyzing the resiliency of HPC applications in extreme-scale computing to silent data corruption is crucial but difficult. An HPC application often contains a large number of computation units that need to be tested, and error propagation caused by error corruption is complex and difficult to interpret. To accommodate this challenge, we propose an interactive visualization system that helps HPC researchers understand the resiliency of HPC applications and compare their error propagation. Our system models an application's error propagation to study a program's resiliency by constructing and visualizing its fault tolerance boundary. Coordinating with multiple interactive designs, our system enables domain experts to efficiently explore the complicated spatial and temporal correlation between error propagations. At the end, the system integrated a nonmonotonic error propagation analysis with an adjustable graph propagation visualization to help domain experts examine the details of error propagation and answer such questions as why an error is mitigated or amplified by program execution.

Page 1 of 13

SCIENTIFIC COMPUTING AND IMAGING INSTITUTEat the University of Utah

SCI Publications

SCIENTIFIC COMPUTING AND IMAGING INSTITUTE
at the University of Utah