Shusen Liu, SCI Institute, University of Utah

The overall interface consists of two views and one data operation panel. These visual components are coordinated to provide a comprehensive view of the data by highlighting its various aspects. They are interconnected such that selections and changes made in one component will be reflected in others. The system is designed to be modular and is easily extendable to include additional visual components.

Embedding View (a): This is the main canvas of the interface where the results of DR, points embedded in 2D, are visualized. It contains a rich set of user interactions for data exploration. One could apply different colormaps to visualize points by values of a particular dimension, clustering labels or point-wise distortion measures. %In the case of visualizing local distortion, the range of a colormap could be further adjusted by a percentage value to accommodate new values that could be potentially out of range.

Parallel Coordinate View (b): This view displays the original data with each of its dimensions as a vertical axis and each point as a line drawing through each of the axis. A normalization of the range for each axis is optional to increase readability of the data.

Data Operation Panel (c): This panel contains various data operations such as DR and clustering. The panel is part of the interlinked system so that changes made to the dataset are instantly reflected through other views. The panel consists of three sub-panels: The meta-information panel gives a direct view of the data, in terms of its dimensions and statistics, and includes the ability to filter (hide) certain dimensions for analysis; The clustering panel allows the user to select distance metrics, data standardization schemes (see supplemental material) and hierarchical clustering methods (e.g. classical single-, average-linkage), while also allowing loading of existing clustering; and the DR panel enables the user to choose DR techniques and specify their parameters in an online fashion.

We illustrate a typical interactive exploration pipeline in above figure.

(a) We apply a certain DR technique to the high-dimensional dataset and obtain its initial embedding, where global distortion measures such as co-ranking could be employed to select a suitable DR and its optimal parameter setting.

(b) We visualize point-wise distortions on the embedding. Regions with high distortions across multiple measures (for example) are identified as regions of interest for further investigation.

(c) We apply hierarchical clustering of the data.

(d) We use point-wise distortions to guide our clustering selection, where the appropriate level of clustering is chosen based on its agreement with the region of interest.

(e) We allow users to move and/or delete a subset of data that belongs to a targeted cluster in the visual space,
where on-the-fly updates of point-wise distortion measures reflect structural relations between different parts of the data.
A decrease/increase in distortion measure of the targeted cluster typically indicates structural independencies/dependencies among the target and its neighboring clusters.

(f) In addition, with detailed parameter analysis across each cluster, we obtain further insights regarding differentiating factors among different regions of the data.

Finally, we obtain a collection of structural insights.

The video from our paper explain the analysis workflow in details:

DataExplorerHD v0.1 Documentation