T. M. Athawale, D. Maljovec. L. Yan, C. R. Johnson, V. Pascucci, B. Wang.
Uncertainty Visualization of 2D Morse Complex Ensembles Using Statistical Summary Maps, In IEEE Transactions on Visualization and Computer Graphics, Vol. 28, No. 4, pp. 1955-1966. April, 2022.
Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Data uncertainty inherent to scalar fields due to randomness in their acquisition and processing, however, limits our understanding of Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arises from scalar fields coupled with data uncertainty. We propose several statistical summary maps as new entities for quantifying structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce three types of statistical summary maps – the probabilistic map , the significance map , and the survival map – to characterize the uncertain behaviors of gradient flows. We demonstrate the utility of our proposed approach using wind, flow, and ocean eddy simulation datasets.
Z. Li, S. Liu, X. Yu, K. Bhavya, J. Cao, J. Diffenderfer, P.T. Bremer, V. Pascucci. “Understanding Robustness Lottery”: A Comparative Visual Analysis of Neural Network Pruning Approaches, Subtitled arXiv preprint arXiv:2206.07918, 2022.
Deep learning approaches have provided state-of-the-art performance in many applications by relying on extremely large and heavily overparameterized neural networks. However, such networks have been shown to be very brittle, not generalize well to new uses cases, and are often difficult if not impossible to deploy on resources limited platforms. Model pruning, i.e., reducing the size of the network, is a widely adopted strategy that can lead to more robust and generalizable network -- usually orders of magnitude smaller with the same or even improved performance. While there exist many heuristics for model pruning, our understanding of the pruning process remains limited. Empirical studies show that some heuristics improve performance while others can make models more brittle or have other side effects. This work aims to shed light on how different pruning methods alter the network's internal feature representation, and the corresponding impact on model performance. To provide a meaningful comparison and characterization of model feature space, we use three geometric metrics that are decomposed from the common adopted classification loss. With these metrics, we design a visualization system to highlight the impact of pruning on model prediction as well as the latent feature embedding. The proposed tool provides an environment for exploring and studying differences among pruning methods and between pruned and original model. By leveraging our visualization, the ML researchers can not only identify samples that are fragile to model pruning and data corruption but also obtain insights and explanations on how some pruned …
Y. Livnat, D. Maljovec, A. Gyulassy, B. Mouginot, V. Pascucci. A Novel Tree Visualization to Guide Interactive Exploration of Multi-dimensional Topological Hierarchies, Subtitled arXiv preprint arXiv:2208.06952, 2022.
Understanding the response of an output variable to multi-dimensional inputs lies at the heart of many data exploration endeavours. Topology-based methods, in particular Morse theory and persistent homology, provide a useful framework for studying this relationship, as phenomena of interest often appear naturally as fundamental features. The Morse-Smale complex captures a wide range of features by partitioning the domain of a scalar function into piecewise monotonic regions, while persistent homology provides a means to study these features at different scales of simplification. Previous works demonstrated how to compute such a representation and its usefulness to gain insight into multi-dimensional data. However, exploration of the multi-scale nature of the data was limited to selecting a single simplification threshold from a plot of region count. In this paper, we present a novel tree visualization that provides a concise overview of the entire hierarchy of topological features. The structure of the tree provides initial insights in terms of the distribution, size, and stability of all partitions. We use regression analysis to fit linear models in each partition, and develop local and relative measures to further assess uniqueness and the importance of each partition, especially with respect parents/children in the feature hierarchy. The expressiveness of the tree visualization becomes apparent when we encode such measures using colors, and the layout allows an unprecedented level of control over feature selection during exploration. For instance, selecting features from multiple scales of the hierarchy enables a more nuanced exploration. Finally, we …
We propose a simple, yet effective method for clustering finite elements in order to improve preprocessing times and rendering performance of unstructured volumetric grids. Rather than building bounding volume hierarchies (BVHs) over individual elements, we sort elements along a Hilbert curve and aggregate neighboring elements together, significantly improving BVH memory consumption. Then to further reduce memory consumption, we cluster the mesh on the fly into sub-meshes with smaller indices using series of efficient parallel mesh re-indexing operations. These clusters are then passed to a highly optimized ray tracing API for both point containment queries and ray-cluster intersection testing. Each cluster is assigned a maximum extinction value for adaptive sampling, which we rasterize into non-overlapping view-aligned bins allocated along the ray. These maximum extinction bins are then used to guide the placement of samples along the ray during visualization, significantly reducing the number of samples required and greatly improving overall visualization interactivity. Using our approach, we improve rendering performance over a competitive baseline on the NASA Mars Lander dataset by 6×(1FPS up to 6FPS including volumetric shadows) while simultaneously reducing memory consumption by 3×(33GB down to 11GB) and avoiding any offline preprocessing steps, enabling high quality interactive visualization on consumer graphics cards. By utilizing the full 48 GB of an RTX 8000, we improve performance of Lander by 17×(1FPS up to 17FPS), enabling new possibilities for large data exploration.
This work presents NSDF-FUSE, a testbed for evaluating settings and performance of FUSE-based file systems on top of S3-compatible object storage; the testbed is part of a suite of services from the National Science Data Fabric (NSDF) project (an NSF-funded project that is delivering cyberinfrastructures for data scientists). We demonstrate how NSDF-FUSE can be deployed to evaluate eight different mapping packages that mount S3-compatible object storage to a file system, as well as six data patterns representing different I/O operations on two cloud platforms. NSDF-FUSE is open-source and can be easily extended to run with other software mapping packages and different cloud platforms.
Scalable CPU Ray Tracing for In Situ Visualization Using OSPRay, In In Situ Visualization for Computational Science, Springer International Publishing, pp. 353--374. 2022.
In situ visualization increasingly involves rendering large numbers of images for post hoc exploration. As both the number of images to be rendered and the data being rendered are large, the scalability of the rendering component is of key concern. Furthermore, the renderer must be able to support a wide range of data distributions, simulation configurations, and HPC systems to provide the flexibility required for a portable, general purpose in situ rendering package. In this chapter, we discuss recent developments in OSPRay’s support for MPI-parallel applications to provide a flexible and scalable rendering API, with a focus on how these developments can be applied to enable scalable, high-quality in situ visualization.
H. Bhatia, S. N. Petruzza, R. Anirudh, A. G. Gyulassy, R. M. Kirby, V. Pascucci, P. T. Bremer. Data-Driven Estimation of Temporal-Sampling Errors in Unsteady Flows, 2021.
While computer simulations typically store data at the highest available spatial resolution, it is often infeasible to do so for the temporal dimension. Instead, the common practice is to store data at regular intervals, the frequency of which is strictly limited by the available storage and I/O bandwidth. However, this manner of temporal subsampling can cause significant errors in subsequent analysis steps. More importantly, since the intermediate data is lost, there is no direct way of measuring this error after the fact. One particularly important use case that is affected is the analysis of unsteady flows using pathlines, as it depends on an accurate interpolation across time. Although the potential problem with temporal undersampling is widely acknowledged, there currently does not exist a practical way to estimate the potential impact. This paper presents a simple-to-implement yet powerful technique to estimate the error in pathlines due to temporal subsampling. Given an unsteady flow, we compute pathlines at the given temporal resolution as well as subsamples thereof. We then compute the error induced due to various levels of subsampling and use it to estimate the error between the given resolution and the unknown ground truth. Using two turbulent flows, we demonstrate that our approach, for the first time, provides an accurate, a posteriori error estimate for pathline computations. This estimate will enable scientists to better understand the uncertainties involved in pathline-based analysis techniques and can lead to new uncertainty visualization approaches using the predicted errors.
H. Bhatia, D. Hoang, N. Morrical, V. Pascucci, P.T. Bremer, P. Lindstrom. AMM: Adaptive Multilinear Meshes, Subtitled arXiv:2007.15219, 2021.
Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g., through compression, or adapting data resolution, e.g., using spatial hierarchies. Recent research suggests that combining the two approaches, i.e., adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes
E. Deelman, A. Mandal, A. P. Murillo, J. Nabrzyski, V. Pascucci, R. Ricci, I. Baldin, S. Sons, L. Christopherson, C. Vardeman, R. F. da Silva, J. Wyngaard, S. Petruzza, M. Rynge, K. Vahi, W. R. Whitcup, J. Drake, E. Scott. Blueprint: Cyberinfrastructure Center of Excellence, Subtitled arXiv, 2021.
In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs' CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities' data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a …
A. A. Gooch, S. Petruzza, A. Gyulassy, G. Scorzelli, V. Pascucci, L. Rantham, W. Adcock, C. Coopmans.
Lessons learned towards the immediate delivery of massive aerial imagery to farmers and crop consultants, In Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping VI, Vol. 11747, International Society for Optics and Photonics, pp. 22 -- 34. 2021.
In this paper, we document lessons learned from using ViSOAR Ag Explorer™ in the fields of Arkansas and Utah in the 2018-2020 growing seasons. Our insights come from creating software with fast reading and writing of 2D aerial image mosaics for platform-agnostic collaborative analytics and visualization. We currently enable stitching in the field on a laptop without the need for an internet connection. The full resolution result is then available for instant streaming visualization and analytics via Python scripting. While our software, ViSOAR Ag Explorer™ removes the time and labor software bottleneck in processing large aerial surveys, enabling a cost-effective process to deliver actionable information to farmers, we learned valuable lessons with regard to the acquisition, storage, viewing, analysis, and planning stages of aerial data surveys. Additionally, with the ultimate goal of stitching thousands of images in minutes on board a UAV at the time of data capture, we performed preliminary tests for on-board, real-time stitching and analysis on USU AggieAir sUAS using lightweight computational resources. This system is able to create a 2D map while flying and allow interactive exploration of the full resolution data as soon as the platform has landed or has access to a network. This capability further speeds up the assessment process on the field and opens opportunities for new real-time photogrammetry applications. Flying and imaging over 1500-2000 acres per week provides up-to-date maps that give crop consultants a much broader scope of the field in general as well as providing a better view into planting and field preparation than could be observed from field level. Ultimately, our software and hardware could provide a much better understanding of weed presence and intensity or lack thereof.
X. Huang, P. Klacansky, S. Petruzza, A. Gyulassy, P.T. Bremer, V. Pascucci. Distributed merge forest: a new fast and scalable approach for topological analysis at scale, In Proceedings of the ACM International Conference on Supercomputing, pp. 367-377. 2021.
Topological analysis is used in several domains to identify and characterize important features in scientific data, and is now one of the established classes of techniques of proven practical use in scientific computing. The growth in parallelism and problem size tackled by modern simulations poses a particular challenge for these approaches. Fundamentally, the global encoding of topological features necessitates inter process communication that limits their scaling. In this paper, we extend a new topological paradigm to the case of distributed computing, where the construction of a global merge tree is replaced by a distributed data structure, the merge forest, trading slower individual queries on the structure for faster end-to-end performance and scaling. Empirically, the queries that are most negatively affected also tend to have limited practical use. Our experimental results demonstrate the scalability of both the merge forest construction and the parallel queries needed in scientific workflows, and contrast this scalability with the two established alternatives that construct variations of a global tree.
Z. Li, H. Menon, K. Mohror, P. T. Bremer, Y. Livant, V. Pascucci. Understanding a program's resiliency through error propagation, In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 362-373. 2021.
Aggressive technology scaling trends have worsened the transient fault problem in high-performance computing (HPC) systems. Some faults are benign, but others can lead to silent data corruption (SDC), which represents a serious problem; a fault introducing an error that is not readily detected nto an HPC simulation. Due to the insidious nature of SDCs, researchers have worked to understand their impact on applications. Previous studies have relied on expensive fault injection campaigns with uniform sampling to provide overall SDC rates, but this solution does not provide any feedback on the code regions without samples.
T. McDonald, R. Shrestha, X. Yi, H. Bhatia, D. Chen, D. Goswami, V. Pascucci, T. Turbyville, P‐T Bremer. Leveraging Topological Events in Tracking Graphs for Understanding Particle Diffusion, In Computer Graphics Forum, Vol. 40, No. 3, pp. 251-262. 2021.
Single particle tracking (SPT) of fluorescent molecules provides significant insights into the diffusion and relative motion of tagged proteins and other structures of interest in biology. However, despite the latest advances in high-resolution microscopy, individual particles are typically not distinguished from clusters of particles. This lack of resolution obscures potential evidence for how merging and splitting of particles affect their diffusion and any implications on the biological environment. The particle tracks are typically decomposed into individual segments at observed merge and split events, and analysis is performed without knowing the true count of particles in the resulting segments. Here, we address the challenges in analyzing particle tracks in the context of cancer biology. In particular, we study the tracks of KRAS protein, which is implicated in nearly 20% of all human cancers, and whose clustering and aggregation have been linked to the signaling pathway leading to uncontrolled cell growth. We present a new analysis approach for particle tracks by representing them as tracking graphs and using topological events – merging and splitting, to disambiguate the tracks. Using this analysis, we infer a lower bound on the count of particles as they cluster and create conditional distributions of diffusion speeds before and after merge and split events. Using thousands of time-steps of simulated and in-vitro SPT data, we demonstrate the efficacy of our method, as it offers the biologists a new, detailed look into the relationship between KRAS clustering and diffusion speeds.
N. Morrical, J. Tremblay, Y. Lin, S. Tyree, S. Birchfield, V. Pascucci, I. Wald. NViSII: A Scriptable Tool for Photorealistic Image Generation, Subtitled arXiv preprint arXiv:2105.13962, 2021.
We present a Python-based renderer built on NVIDIA's OptiX ray tracing engine and the OptiX AI denoiser, designed to generate high-quality synthetic images for research in computer vision and deep learning. Our tool enables the description and manipulation of complex dynamic 3D scenes containing object meshes, materials, textures, lighting, volumetric data (e.g., smoke), and backgrounds. Metadata, such as 2D/3D bounding boxes, segmentation masks, depth maps, normal maps, material properties, and optical flow vectors, can also be generated. In this work, we discuss design goals, architecture, and performance. We demonstrate the use of data generated by path tracing for training an object detector and pose estimator, showing improved performance in sim-to-real transfer in situations that are difficult for traditional raster-based renderers. We offer this tool as an easy-to-use, performant, high-quality renderer for advancing research in synthetic data generation and deep learning.
W. Usher, X. Huang, S. Petruzza, S. Kumar, S. R. Slattery, S. T. Reeve, F. Wang, C. R. Johnson,, V. Pascucci. Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts, In IPDPS, 2021.
A. Venkat, A. Gyulassy, G. Kosiba, A. Maiti, H. Reinstein, R. Gee, P.-T. Bremer, V. Pascucci. Towards replacing physical testing of granular materials with a Topology-based Model, Subtitled arXiv preprint arXiv:2109.08777, 2021.
In the study of packed granular materials, the performance of a sample (e.g., the detonation of a high-energy explosive) often correlates to measurements of a fluid flowing through it. The "effective surface area," the surface area accessible to the airflow, is typically measured using a permeametry apparatus that relates the flow conductance to the permeable surface area via the Carman-Kozeny equation. This equation allows calculating the flow rate of a fluid flowing through the granules packed in the sample for a given pressure drop. However, Carman-Kozeny makes inherent assumptions about tunnel shapes and flow paths that may not accurately hold in situations where the particles possess a wide distribution in shapes, sizes, and aspect ratios, as is true with many powdered systems of technological and commercial interest. To address this challenge, we replicate these measurements virtually on micro-CT images of the powdered material, introducing a new Pore Network Model based on the skeleton of the Morse-Smale complex. Pores are identified as basins of the complex, their incidence encodes adjacency, and the conductivity of the capillary between them is computed from the cross-section at their interface. We build and solve a resistive network to compute an approximate laminar fluid flow through the pore structure. We provide two means of estimating flow-permeable surface area: (i) by direct computation of conductivity, and (ii) by identifying dead-ends in the flow coupled with isosurface extraction and the application of the Carman-Kozeny equation, with the aim of establishing consistency over a range of particle shapes, sizes, porosity levels, and void distribution patterns.
Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Noise inherent to scalar field data due to acquisitions and processing, however, limits our understanding of the Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arise from scalar fields coupled with data uncertainty. We propose statistical summary maps as new entities for capturing structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce two types of statistical summary maps -- the Probabilistic Map and the Survival Map -- to characterize the uncertain behaviors of local extrema and local gradient flows, respectively. We demonstrate the utility of our proposed approach using synthetic and real-world datasets.
The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over fifty experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.
L. Cinquini, S. Petruzza, Jason J. Boutte, S. Ames, G. Abdulla, V. Balaji, R. Ferraro, A. Radhakrishnan, L. Carriere, T. Maxwell, G. Scorzelli, V. Pascucci. Distributed Resources for the Earth System Grid Advanced Management (DREAM), Final Report, 2020.
The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation ) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were rather primitive. At the beginning, the team had the general notion that a better ESGF architecture could be built by modularizing each component, and redefining its interaction with other components by defining and exposing a well defined API. Although this was still the high level principle that guided the work, the DREAM project was able to accomplish its goals by leveraging new practices in IT that started just about 3 or 4 years ago: the advent of containerization technologies (specifically, Docker), the development of frameworks to manage containers at scale (Docker Swarm and Kubernetes), and their application to the commercial Cloud. Thanks to these new technologies, DREAM was able to improve the ESGF architecture (including its computing and visualization services) to a level of deployability and scalability beyond the original expectations.
We present dw2, a flexible and easy-to-use software infrastructure for interactive rendering of large tiled display walls. Our library represents the tiled display wall as a single virtual screen through a display "service", which renderers connect to and send image tiles to be displayed, either from an on-site or remote cluster. The display service can be easily configured to support a range of typical network and display hardware configurations; the client library provides a straightforward interface for easy integration into existing renderers. We evaluate the performance of our display wall service in different configurations using a CPU and GPU ray tracer, in both on-site and remote rendering scenarios using multiple display walls.