![]() ![]() reVISit: Supporting Scalable Evaluation of Interactive Visualizations Subtitled OSF Preprints, Y. Ding, J. Wilburn, H. Shrestha, A. Ndlovu, K. Gadhave, C. Nobre, A. Lex, L. Harrison. 2023. reVISit is an open-source software toolkit and framework for creating, deploying, and monitoring empirical visualization studies. Running a quality empirical study in visualization can be demanding and resource-intensive, requiring substantial time, cost, and technical expertise from the research team. These challenges are amplified as research norms trend towards more complex and rigorous study methodologies, alongside a growing need to evaluate more complex interactive visualizations. reVISit aims to ameliorate these challenges by introducing a domain-specific language for study set-up, and a series of software components, such as UI elements, behavior provenance, and an experiment monitoring and management interface. Together with interactive or static stimuli provided by the experimenter, these are compiled to a ready-to-deploy web-based experiment. We demonstrate reVISit's functionality by re-implementing two studies – a graphical perception task and a more complex, interactive study. reVISit is an open-source community project, available at https://revisit.dev/ |
![]() ![]() Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023. DOI: 10.1145/3588195.3595948 The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing. |
![]() AI for Scientific Visualization C. R. Johnson, H. Shen. In Artificial Intelligence for Science, Edited by Alok Choudhary, Geoffrey Fox, and Tony Hey, World Scientific, pp. 535-552. 2023. DOI: 10.1142/9789811265679_0029 |
![]() ![]() FunMC2: A Filter for Uncertainty Visualization of Marching Cubes on Multi-Core Devices Z. Wang, T. M. Athawale, K. Moreland, J. Chen, C. R. Johnson, D. Pugmire. In Eurographics Symposium on Parallel Graphics and Visualization, 2023. DOI: 10.2312/pgv.20231081 Visualization is an important tool for scientists to extract understanding from complex scientific data. Scientists need to understand the uncertainty inherent in all scientific data in order to interpret the data correctly. Uncertainty visualization has been an active and growing area of research to address this challenge. Algorithms for uncertainty visualization can be expensive, and research efforts have been focused mainly on structured grid types. Further, support for uncertainty visualization in production tools is limited. In this paper, we adapt an algorithm for computing key metrics for visualizing uncertainty in Marching Cubes (MC) to multi-core devices and present the design, implementation, and evaluation for a Filter for uncertainty visualization of Marching Cubes on Multi-Core devices (FunMC2). FunMC2 accelerates the uncertainty visualization of MC significantly, and it is portable across multi-core CPUs and GPUs. Evaluation results show that FunMC2 based on OpenMP runs around 11× to 41× faster on multi-core CPUs than the corresponding serial version using one CPU core. FunMC2 based on a single GPU is around 5× to 9× faster than FunMC2 running by OpenMP. Moreover, FunMC2 is flexible enough to process ensemble data with both structured and unstructured mesh types. Furthermore, we demonstrate that FunMC2 can be seamlessly integrated as a plugin into ParaView, a production visualization tool for post-processing. |
![]() ![]() A Visual Environment for Data Driven Protein Modeling and Validation M. Falk, V. Tobiasson, A. Bock, C. Hansen, A. Ynnerman. In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1-11. 2023. DOI: 10.1109/TVCG.2023.3286582 In structural biology, validation and verification of new atomic models are crucial and necessary steps which limit the production of reliable molecular models for publications and databases. An atomic model is the result of meticulous modeling and matching and is evaluated using a variety of metrics that provide clues to improve and refine the model so it fits our understanding of molecules and physical constraints. In cryo electron microscopy (cryo-EM) the validation is also part of an iterative modeling process in which there is a need to judge the quality of the model during the creation phase. A shortcoming is that the process and results of the validation are rarely communicated using visual metaphors. This work presents a visual framework for molecular validation. The framework was developed in close collaboration with domain experts in a participatory design process. Its core is a novel visual representation based on 2D heatmaps that shows all available validation metrics in a linear fashion, presenting a global overview of the atomic model and provide domain experts with interactive analysis tools. Additional information stemming from the underlying data, such as a variety of local quality measures, is used to guide the user's attention toward regions of higher relevance. Linked with the heatmap is a three-dimensional molecular visualization providing the spatial context of the structures and chosen metrics. Additional views of statistical properties of the structure are included in the visual framework. We demonstrate the utility of the framework and its visual guidance with examples from cryo-EM. |
![]() Orchestration of materials science workflows for heterogeneous resources at large scale, N. Zhou, G. Scorzelli, J. Luettgau, R.R. Kancharla, J. Kane, R. Wheeler, B. Croom, B. Newell, V. Pascucci, M. Taufer. In The International Journal of High Performance Computing Applications, Sage, 2023. In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture. |
![]() ![]() Progressive Tree-Based Compression of Large-Scale Particle Data D. Hoang, H. Bhatia, P. Lindstrom, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--18. 2023. DOI: 10.1109/TVCG.2023.3260628 Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions. |
![]() ![]() Protein-metabolite interactomics of carbohydrate metabolism reveal regulation of lactate dehydrogenase K. G. Hicks, A. A. Cluntun, H. L. Schubert, S. R. Hackett, J. A. Berg, P. G. Leonard, M. A. Ajalla Aleixo, Y. Zhou, A. J. Bott, S. R. Salvatore, F. Chang, A. Blevins, P. Barta, S. Tilley, A. Leifer, A. Guzman, A. Arok, S. Fogarty, J. M. Winter, H. Ahn, K. N. Allen, S. Block, I. A. Cardoso, J. Ding, I. Dreveny, C. Gasper, Q. Ho, A. Matsuura, M. J. Palladino, S. Prajapati, P. Sun, K. Tittmann, D. R. Tolan, J. Unterlass, A. P. VanDemark, M. G. Vander Heiden, B. A. Webb, C. Yun, P. Zhap, B. Wang, F. J. Schopfer, C. P. Hill, M. C. Nonato, F. L. Muller, J. E. Cox, J. Rutter. In Science, Vol. 379, No. 6636, pp. 996-1003. 2023. DOI: 10.1126/science.abm3452 Metabolic networks are interconnected and influence diverse cellular processes. The protein-metabolite interactions that mediate these networks are frequently low affinity and challenging to systematically discover. We developed mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically (MIDAS) to identify such interactions. Analysis of 33 enzymes from human carbohydrate metabolism identified 830 protein-metabolite interactions, including known regulators, substrates, and products as well as previously unreported interactions. We functionally validated a subset of interactions, including the isoform-specific inhibition of lactate dehydrogenase by long-chain acyl–coenzyme A. Cell treatment with fatty acids caused a loss of pyruvate-lactate interconversion dependent on lactate dehydrogenase isoform expression. These protein-metabolite interactions may contribute to the dynamic, tissue-specific metabolic flexibility that enables growth and survival in an ever-changing nutrient environment. Understanding how metabolic state influences cellular processes requires systematic analysis of low-affinity interactions of metabolites with proteins. Hicks et al. describe a method called MIDAS (mass spectrometry integrated with equilibrium dialysis for the discovery of allostery systematically), which allowed them to probe such interactions for 33 enzymes of human carbohydrate metabolism and more than 400 metabolites. The authors detected many known and many new interactions, including regulation of lactate dehydrogenase by ATP and long-chain acyl coenzyme A, which may help to explain known physiological relations between fat and carbohydrate metabolism in different tissues. —LBR A mass spectrometry and dialysis method detects metabolite-protein interactions that help to control physiology. |
![]() ![]() Accelerated Probabilistic Marching Cubes by Deep Learning for Time-Varying Scalar Ensembles M. Han, T.M. Athawale, D. Pugmire, C.R. Johnson. In 2022 IEEE Visualization and Visual Analytics (VIS), IEEE, pp. 155-159. 2022. DOI: 10.1109/VIS54862.2022.00040 Visualizing the uncertainty of ensemble simulations is challenging due to the large size and multivariate and temporal features of en-semble data sets. One popular approach to studying the uncertainty of ensembles is analyzing the positional uncertainty of the level sets. Probabilistic marching cubes is a technique that performs Monte Carlo sampling of multivariate Gaussian noise distributions for positional uncertainty visualization of level sets. However, the technique suffers from high computational time, making interactive visualization and analysis impossible to achieve. This paper introduces a deep-learning-based approach to learning the level-set uncertainty for two-dimensional ensemble data with a multivariate Gaussian noise assumption. We train the model using the first few time steps from time-varying ensemble data in our workflow. We demonstrate that our trained model accurately infers uncertainty in level sets for new time steps and is up to 170X faster than that of the original probabilistic model with serial computation and 10X faster than that of the original parallel computation. |
![]() ![]() Adaptive elasticity policies for staging-based in situ visualization Z. Wang, M. Dorier, P. Subedi, P.E. Davis, M. Parashar. In Future Generation Computer Systems, 2022. ISSN: 0167-739X DOI: https://doi.org/10.1016/j.future.2022.12.010 In situ processing aims to alleviate the growing gap between computation and I/O capabilities by performing data processing close to the data source. In situ processing is widely used to process data generated by multiple data sources, including observation data from edge devices or scientific observational facilities and the simulation data generated by scientific computation on a high-performance computing (HPC) platform. For a scientific workflow that is run on an HPC platform and composed of a simulation program and an in situ data analytics or visualization (abbreviated as ana/vis) task, there is an implicit assumption that the computing resources assigned to the workflow keep static during the workflow execution. However, with the converging trend between the HPC and cloud computing platform, running the in situ ana/vis task in an elastic way is promising to decrease its overhead and improve its resource utilization rate. Resource elasticity represents the ability to change resource configurations such as the number of computing nodes/processes during workflow execution. An elastic job may dynamically adjust resource configurations; it may use a few resources at the beginning and more resources toward the end of the job when interesting data appear. However, it is hard to predict a priori how many computing nodes/processes need to be added/removed during the workflow execution to adapt to changing workflow needs. How to efficiently guide elasticity operations, such as growing or shrinking the number of processes used for in situ analysis during workflow execution, is an open-ended research question. In this article, we present adaptive elasticity policies that adopt workflow runtime information collected during workflow execution to predict how to trigger the addition/removal of processes in order to minimize in situ processing overhead. Taking in situ visualization tasks as an example, we integrate the presented elasticity policies into a staging-based elastic workflow and evaluate its efficiency in multiple elasticity scenarios. Compared with the situation without elasticity or with a static elasticity policy that uses a fixed number of processes for each rescaling operation, the adaptive elasticity policy can save overhead in finding a proper resource configuration and improve resource utilization efficiency. For example, one experiment illustrates that the adaptive elasticity policy saves 41% of core-hours compared with the situation without the resource elasticity. |
![]() ![]() A Visual Comparison of Silent Error Propagation Z. Li, H. Menon, K. Mohror, S. Liu, L. Guo, P.T. Bremer, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2022. DOI: 10.1109/TVCG.2022.3230636 High-performance computing (HPC) systems play a critical role in facilitating scientific discoveries. Their scale and complexity (e.g., the number of computational units and software stack) continue to grow as new systems are expected to process increasingly more data and reduce computing time. However, with more processing elements, the probability that these systems will experience a random bit-flip error that corrupts a program's output also increases, which is often recognized as silent data corruption. Analyzing the resiliency of HPC applications in extreme-scale computing to silent data corruption is crucial but difficult. An HPC application often contains a large number of computation units that need to be tested, and error propagation caused by error corruption is complex and difficult to interpret. To accommodate this challenge, we propose an interactive visualization system that helps HPC researchers understand the resiliency of HPC applications and compare their error propagation. Our system models an application's error propagation to study a program's resiliency by constructing and visualizing its fault tolerance boundary. Coordinating with multiple interactive designs, our system enables domain experts to efficiently explore the complicated spatial and temporal correlation between error propagations. At the end, the system integrated a nonmonotonic error propagation analysis with an adjustable graph propagation visualization to help domain experts examine the details of error propagation and answer such questions as why an error is mitigated or amplified by program execution. |