A. Warner, J. Tate, B. Burton,, C.R. Johnson.
A High-Resolution Head and Brain Computer Model for Forward and Inverse EEG Simulation, In bioRxiv, Cold Spring Harbor Laboratory, Feb, 2019.
To conduct computational forward and inverse EEG studies of brain electrical activity, researchers must construct realistic head and brain computer models, which is both challenging and time consuming. The availability of realistic head models and corresponding imaging data is limited in terms of imaging modalities and patient diversity. In this paper, we describe a detailed head modeling pipeline and provide a high-resolution, multimodal, open-source, female head and brain model. The modeling pipeline specifically outlines image acquisition, preprocessing, registration, and segmentation; three-dimensional tetrahedral mesh generation; finite element EEG simulations; and visualization of the model and simulation results. The dataset includes both functional and structural images and EEG recordings from two high-resolution electrode configurations. The intermediate results and software components are also included in the dataset to facilitate modifications to the pipeline. This project will contribute to neuroscience research by providing a high-quality dataset that can be used for a variety of applications and a computational pipeline that may help researchers construct new head models more efficiently.
L. Zhou, D. Weiskopf, C. R. Johnson.
Perceptually guided contrast enhancement based on viewing distance, In Journal of Computer Languages, Vol. 55, Elsevier, pp. 100911. 2019.
We propose an image-space contrast enhancement method for color-encoded visualization. The contrast of an image is enhanced through a perceptually guided approach that interfaces with the user with a single and intuitive parameter of the virtual viewing distance. To this end, we analyze a multiscale contrast model of the input image and test the visibility of bandpass images of all scales at a virtual viewing distance. By adapting weights of bandpass images with a threshold model of spatial vision, this image-based method enhances contrast to compensate for contrast loss caused by viewing the image at a certain distance. Relevant features in the color image can be further emphasized by the user using overcompensation. The weights can be assigned with a simple band-based approach, or with an efficient pixel-based approach that reduces ringing artifacts. The method is efficient and can be integrated into any visualization tool as it is a generic image-based post-processing technique. Using highly diverse datasets, we show the usefulness of perception compensation across a wide range of typical visualizations.
In this paper, we propose a perceptually-guided visualization sharpening technique.We analyze the spectral behavior of an established comprehensive perceptual model to arrive at our approximated model based on an adapted weighting of the bandpass images from a Gaussian pyramid. The main benefit of this approximated model is its controllability and predictability for sharpening color-mapped visualizations. Our method can be integrated into any visualization tool as it adopts generic image-based post-processing, and it is intuitive and easy to use as viewing distance is the only parameter. Using highly diverse datasets, we show the usefulness of our method across a wide range of typical visualizations.
D. N. Anderson, B. Osting, J. Vorwerk, A. D Dorval, C. R Butson. Optimized programming algorithm for cylindrical and directional deep brain stimulation electrodes, In Journal of Neural Engineering, Vol. 15, No. 2, pp. 026005. 2018.
Objective. Deep brain stimulation (DBS) is a growing treatment option for movement and psychiatric disorders. As DBS technology moves toward directional leads with increased numbers of smaller electrode contacts, trial-and-error methods of manual DBS programming are becoming too time-consuming for clinical feasibility. We propose an algorithm to automate DBS programming in near real-time for a wide range of DBS lead designs. Approach. Magnetic resonance imaging and diffusion tensor imaging are used to build finite element models that include anisotropic conductivity. The algorithm maximizes activation of target tissue and utilizes the Hessian matrix of the electric potential to approximate activation of neurons in all directions. We demonstrate our algorithm's ability in an example programming case that targets the subthalamic nucleus (STN) for the treatment of Parkinson's disease for three lead designs: the Medtronic 3389 (four cylindrical contacts), the direct STNAcute (two cylindrical contacts, six directional contacts), and the Medtronic-Sapiens lead (40 directional contacts). Main results. The optimization algorithm returns patient-specific contact configurations in near real-time—less than 10 s for even the most complex leads. When the lead was placed centrally in the target STN, the directional leads were able to activate over 50% of the region, whereas the Medtronic 3389 could activate only 40%. When the lead was placed 2 mm lateral to the target, the directional leads performed as well as they did in the central position, but the Medtronic 3389 activated only 2.9% of the STN. Significance. This DBS programming algorithm can be applied to cylindrical electrodes as well as novel directional leads that are too complex with modern technology to be manually programmed. This algorithm may reduce clinical programming time and encourage the use of directional leads, since they activate a larger volume of the target area than cylindrical electrodes in central and off-target lead placements.
The biophysical basis for electrocardiographic evaluation of myocardial ischemia stems from the notion that ischemic tissues develop, with relative uniformity, along the endocardial aspects of the heart. These injured regions of subendocardial tissue give rise to intramural currents that lead to ST segment deflections within electrocardiogram (ECG) recordings. The concept of subendocardial ischemic regions is often used in clinical practice, providing a simple and intuitive description of ischemic injury; however, such a model grossly oversimplifies the presentation of ischemic disease—inadvertently leading to errors in ECG-based diagnoses. Furthermore, recent experimental studies have brought into question the subendocardial ischemia paradigm suggesting instead a more distributed pattern of tissue injury. These findings come from experiments and so have both the impact and the limitations of measurements from living organisms. Computer models have often been employed to overcome the constraints of experimental approaches and have a robust history in cardiac simulation. To this end, we have developed a computational simulation framework aimed at elucidating the effects of ischemia on measurable cardiac potentials. To validate our framework, we simulated, visualized, and analyzed 226 experimentally derived acute myocardial ischemic events. Simulation outcomes agreed both qualitatively (feature comparison) and quantitatively (correlation, average error, and significance) with experimentally obtained epicardial measurements, particularly under conditions of elevated ischemic stress. Our simulation framework introduces a novel approach to incorporating subject-specific, geometric models and experimental results that are highly resolved in space and time into computational models. We propose this framework as a means to advance the understanding of the underlying mechanisms of ischemic disease while simultaneously putting in place the computational infrastructure necessary to study and improve ischemia models aimed at reducing diagnostic errors in the clinic.
Computational models of myocardial ischemia often use oversimplified ischemic source representations to simulate epicardial potentials. The purpose of this study was to explore the influence of biophysically justified, subject-specific ischemic zone representations on epicardial potentials.
We developed and implemented an image-based simulation pipeline, using intramural recordings from a canine experimental model to define subject-specific ischemic regions within the heart. Static epicardial potential distributions, reflective of ST segment deviations, were simulated and validated against measured epicardial recordings.
Simulated epicardial potential distributions showed strong statistical correlation and visual agreement with measured epicardial potentials. Additionally, we identified and described in what way border zone parameters influence epicardial potential distributions during the ST segment.
From image-based simulations of myocardial ischemia, we generated subject-specific ischemic sources that accurately replicated epicardial potential distributions. Such models are essential in understanding the underlying mechanisms of the bioelectric fields that arise during ischemia and are the basis for more sophisticated simulations of body surface ECGs.
Background: Noninvasive localization of premature ventricular complexes (PVCs) to guide ablation therapy is one of the emerging applications of electrocardiographic imaging (ECGI). Because of its increasing clinical use, it is essential to compare the many implementations of ECGI that exist to understand the specific characteristics of each approach.
Objective: Our consortium is a community of researchers aiming to collaborate in the field of ECGI, and to objectively compare and improve methods. Here, we will compare methods to localize the origin of PVCs with ECGI.
Methods: Our consortium hosts a repository of ECGI data on its website. For the current study, participants analysed simulated electrocardiograms from premature beats, freely available on that website. These PVCs were simulated to originate from eight ventricular locations and the resulting body-surface potentials were computed. These body-surface electrocardiograms (and the torso-heart geometry) were then provided to the study participants to apply their ECGI algorithms to determine the origin of the PVCs. Participants could choose freely among four different source models, i.e., representations of the bioelectric fields reconstructed from ECGI: 1) epicardial potentials (POTepi), 2) epicardial & endocardial potentials (POTepi&endo), 3) transmembrane potentials on the endocardium and epicardium (TMPepi&endo) and 4) transmembrame potentials throughout the myocardium (TMPmyo). Participants were free to employ any software implementation of ECGI and were blinded to the ground truth data.
Results: Four research groups submitted 11 entries for this study. The figure shows the localization error between the known and reconstructed origin of each PVC for each submission, categorized per source model. Each colour represents one research group and some groups submitted results using different approaches. These results demonstrate that the variation of accuracy was larger among research groups than among the source models. Most submissions achieved an error below 2 cm, but none performed with a consistent sub-centimetre accuracy.
Conclusion: This study demonstrates a successful community-based approach to study different ECGI methods for PVC localization. The goal was not to rank research groups but to compare both source models and numerical implementations. PVC localization with these methods was not as dependent on the source representation as it was on the implementation of ECGI. Consequently, ECGI validation should not be performed on generic methods, but should be specifically performed for each lab's implementation. The novelty of this study is that it achieves this in the first open, international comparison of approaches using a common set of gold standards. Continued collaborative validation is essential to understand the effect of implementation differences, in order to reach significant improvements and arrive at clinically-relevant sub-centimetre accuracy of PVC localization.
M. Cluitmans, D. H. Brooks, R. MacLeod, O. Dössel, M. S. Guillem, P. M. van Dam, J. Svehlikova, B. He, J. Sapp, L. Wang, L. Bear.
Validation and Opportunities of Electrocardiographic Imaging: From Technical Achievements to Clinical Applications, In Frontiers in Physiology, Vol. 9, Frontiers Media SA, pp. 1305. 2018.
Electrocardiographic imaging (ECGI) reconstructs the electrical activity of the heart from a dense array of body-surface electrocardiograms and a patient-specific heart-torso geometry. Depending on how it is formulated, ECGI allows the reconstruction of the activation and recovery sequence of the heart, the origin of premature beats or tachycardia, the anchors/hotspots of re-entrant arrhythmias and other electrophysiological quantities of interest. Importantly, these quantities are directly and noninvasively reconstructed in a digitized model of the patient’s three-dimensional heart, which has led to clinical interest in ECGI’s ability to personalize diagnosis and guide therapy.
Despite considerable development over the last decades, validation of ECGI is challenging. Firstly, results depend considerably on implementation choices, which are necessary to deal with ECGI’s ill-posed character. Secondly, it is challenging to obtain (invasive) ground truth data of high quality. In this review, we discuss the current status of ECGI validation as well as the major challenges remaining for complete adoption of ECGI in clinical practice.
Specifically, showing clinical benefit is essential for the adoption of ECGI. Such benefit may lie in patient outcome improvement, workflow improvement, or cost reduction. Future studies should focus on these aspects to achieve broad adoption of ECGI, but only after the technical challenges have been solved for that specific application/pathology. We propose ‘best’ practices for technical validation and highlight collaborative efforts recently organized in this field. Continued interaction between engineers, basic scientists and physicians remains essential to find a hybrid between technical achievements, pathological mechanisms insights, and clinical benefit, to evolve this powerful technique towards a useful role in clinical practice.
Targeting Neuronal Fiber Tracts for Deep Brain Stimulation Therapy Using Interactive, Patient-Specific Models, In Journal of Visualized Experiments, No. 138, MyJove Corporation, Aug, 2018.
Deep brain stimulation (DBS), which involves insertion of an electrode to deliver stimulation to a localized brain region, is an established therapy for movement disorders and is being applied to a growing number of disorders. Computational modeling has been successfully used to predict the clinical effects of DBS; however, there is a need for novel modeling techniques to keep pace with the growing complexity of DBS devices. These models also need to generate predictions quickly and accurately. The goal of this project is to develop an image processing pipeline to incorporate structural magnetic resonance imaging (MRI) and diffusion weighted imaging (DWI) into an interactive, patient specific model to simulate the effects of DBS. A virtual DBS lead can be placed inside of the patient model, along with active contacts and stimulation settings, where changes in lead position or orientation generate a new finite element mesh and solution of the bioelectric field problem in near real-time, a timespan of approximately 10 seconds. This system also enables the simulation of multiple leads in close proximity to allow for current steering by varying anodes and cathodes on different leads. The techniques presented in this paper reduce the burden of generating and using computational models while providing meaningful feedback about the effects of electrode position, electrode design, and stimulation configurations to researchers or clinicians who may not be modeling experts.
Traumatic brain injury (TBI) is a looming epidemic, growing most rapidly in the elderly population. Some of the most devastating sequelae of TBI are related to depressed levels of consciousness (e.g., coma, minimally conscious state) or deficits in executive function. To date, pharmacological and rehabilitative therapies to treat these sequelae are limited. Deep brain stimulation (DBS) has been used to treat a number of pathologies, including Parkinson disease, essential tremor, and epilepsy. Animal and clinical research shows that targets addressing depressed levels of consciousness include components of the ascending reticular activating system and areas of the thalamus. Targets for improving executive function are more varied and include areas that modulate attention and memory, such as the frontal and prefrontal cortex, fornix, nucleus accumbens, internal capsule, thalamus, and some brainstem nuclei. The authors review the literature addressing the use of DBS to treat higher-order cognitive dysfunction and disorders of consciousness in TBI patients, while also offering suggestions on directions for future research.
Personalized virtual-heart technology for guiding the ablation of infarct-related ventricular tachycardia, In Nature Biomedical Engineering, Springer Nature America, Inc, September, 2018.
Ventricular tachycardia (VT), which can lead to sudden cardiac death, occurs frequently in patients with myocardial infarction. Catheter-based radio-frequency ablation of cardiac tissue has achieved only modest efficacy, owing to the inaccurate identification of ablation targets by current electrical mapping techniques, which can lead to extensive lesions and to a prolonged, poorly tolerated procedure. Here, we show that personalized virtual-heart technology based on cardiac imaging and computational modelling can identify optimal infarct-related VT ablation targets in retrospective animal (five swine) and human studies (21 patients), as well as in a prospective feasibility study (five patients). We first assessed, using retrospective studies (one of which included a proportion of clinical images with artefacts), the capability of the technology to determine the minimum-size ablation targets for eradicating all VTs. In the prospective study, VT sites predicted by the technology were targeted directly, without relying on prior electrical mapping. The approach could improve infarct-related VT ablation guidance, where accurate identification of patient-specific optimal targets could be achieved on a personalized virtual heart before the clinical procedure.
A. Rodenhauser, W.W. Good, B. Zenger, J. Tate, K. Aras, B. Burton, R.S. Macleod.
PFEIFER: Preprocessing Framework for Electrograms Intermittently Fiducialized from Experimental Recordings, In The Journal of Open Source Software, Vol. 3, No. 21, The Open Journal, pp. 472. Jan, 2018.
Preprocessing Framework for Electrograms Intermittently Fiducialized from Experimental Recordings (PFEIFER) is a MATLAB Graphical User Interface designed to process bioelectric signals acquired from experiments.
PFEIFER was specifically designed to process electrocardiographic recordings from electrodes placed on or around the heart or on the body surface. Specific steps included in PFEIFER allow the user to remove some forms of noise, correct for signal drift, and mark specific instants or intervals in time (fiducialize) within all of the time sampled channels. PFEIFER includes many unique features that allow the user to process electrical signals in a consistent and time efficient manner, with additional options for advanced user configurations and input. PFEIFER is structured as a consolidated framework that provides many standard processing pipelines but also has flexibility to allow the user to customize many of the steps. PFEIFER allows the user to import time aligned cardiac electrical signals, semi-automatically determine fiducial markings from those signals, and perform computational tasks that prepare the signals for subsequent display and analysis.
S. Thomas, J. Silvernagel, N. Angel, E. Kholmovski, E. Ghafoori, N. Hu, J. Ashton, D.J. Dosdall, R.S. MacLeod, R. Ranjan.
Higher contact force during radiofrequency ablation leads to a much larger increase in edema as compared to chronic lesion size, In Journal of Cardiovascular Electrophysiology, Wiley, June, 2018.
Reversible edema is a part of any radiofrequency ablation but its relationship with contact force is unknown. The goal of this study was to characterize through histology and MRI, acute and chronic ablation lesions and reversible edema with contact force.
2 Methods and results
In a canine model (n = 14), chronic ventricular lesions were created with a 3.5‐mm tip ThermoCool SmartTouch (Biosense Webster) catheter at 25 W or 40 W for 30 seconds. Repeat ablation was performed after 3 months to create a second set of lesions (acute). Each ablation procedure was followed by in vivo T2‐weighted MRI for edema and late‐gadolinium enhancement (LGE) MRI for lesion characterization. For chronic lesions, the mean scar volumes at 25 W and 40 W were 77.8 ± 34.5 mm3 (n = 24) and 139.1 ± 69.7 mm3 (n = 12), respectively. The volume of chronic lesions increased (25 W: P < 0.001, 40 W: P < 0.001) with greater contact force. For acute lesions, the mean volumes of the lesion were 286.0 ± 129.8 mm3 (n = 19) and 422.1 ± 113.1 mm3 (n = 16) for 25 W and 40 W, respectively (P < 0.001 compared to chronic scar). On T2‐weighted MRI, the acute edema volume was on average 5.6–8.7 times higher than the acute lesion volume and increased with contact force (25 W: P = 0.001, 40 W: P = 0.011).
With increasing contact force, there is a marginal increase in lesion size but accompanied with a significantly larger edema. The reversible edema that is much larger than the chronic lesion volume may explain some of the chronic procedure failures.
Atrial fibrillation (AF) is the most prevalent form of cardiac arrhythmia. Current treatments for AF remain suboptimal due to a lack of understanding of the underlying atrial structures that directly sustain AF. Existing approaches for analyzing atrial structures in 3D, especially from late gadolinium-enhanced (LGE)-MRIs, rely heavily on manual segmentation methods which are extremely labor-intensive and prone to errors. As a result, a robust and automated method for analyzing atrial structures in 3D is of high interest. We have therefore developed AtriaNet, a 16-layer convolutional neural network (CNN), on 154 3D LGE-MRIs with a spatial resolution of 0.625 mm × 0.625 mm × 1.25 mm from patients with AF, to automatically segment the left atrial (LA) epicardium and endocardium. AtriaNet consists of a multi-scaled, dual pathway architecture that captures both the local atrial tissue geometry, and the global positional information of LA using 13 successive convolutions, and 3 further convolutions for merging. By utilizing computationally efficient batch prediction, AtriaNet was able to successfully process each 3D LGE-MRI within one minute. Furthermore, benchmarking experiments showed that AtriaNet outperformed state-of-the-art CNNs, with a DICE score of 0.940 and 0.942 for the LA epicardium and endocardium respectively, and an inter-patient variance of <0.001. The estimated LA diameter and volume computed from the automatic segmentations were accurate to within 1.59 mm and 4.01 cm³ of the ground truths. Our proposed CNN was tested on the largest known dataset for LA segmentation, and to the best of our knowledge, it is the most robust approach that has ever been developed for segmenting LGE-MRIs. The increased accuracy of atrial reconstruction and analysis could potentially improve the understanding and treatment of AF.
We compared the cranial base of newborn Pax7-deficient and wildtype mice using a computational shape modeling technology called particle-based modeling (PBM). We found systematic differences in the morphology of the basiooccipital bone, including a broadening of the basioccipital bone and an antero-inferior inflection of its posterior edge in the Pax7-deficient mice. We show that the Pax7 cell lineage contributes to the basioccipital bone and that the location of the Pax7 lineage correlates with the morphology most effected by Pax7 deficiency. Our results suggest that the Pax7-deficient mouse may be a suitable model for investigating the genetic control of the location and orientation of the foramen magnum, and changes in the breadth of the basioccipital.
J. Coll-Font, S. Ariafar, D. H Brooks. ECG-Based Reconstruction of Heart Position and Orientation with Bayesian Optimization, In Computing in Cardiology, Vol. 44, 2017.
Respiratory motion is known to cause beat-to-beat variation of the ECG. This observation suggests that it may be possible to use this variation to track position and orientation of the heart. Electrocardiographic Imaging (ECGI) would benefit from such a reconstruction since one contribution to errors in its solutions is respiratory motion of the heart. ECGI solutions generally rely on prior computation of a "forward" model that relates cardiac electrical activity to ECGs. However, the ill-posed nature of the inverse solution leads to large errors in ECGI even for small amounts of error in the forward model. The current work is a first step towards reducing those errors using a nominal forward model and the ECG itself. We describe a method that can reconstruct cardiac position / orientation using known potentials on both the heart and torso. Our current implementation is based on Bayesian Optimization and efficiently optimizes for the position / orientation of the heart to minimize error between measured and forward-computed torso potentials. We evaluated our approach with synthesized torso potentials under a model of respiratory motion and also using potentials recorded in a tank experiment on a canine epicardium and the tank surfaces. Our results show that our method performs accurately in synthetic experiments and can account for part of the error between forward-computed and measured ECGs in the tank experiments.
Background Magnetic resonance imaging (MRI) has been used to acutely visualize radiofrequency ablation lesions, but its accuracy in predicting chronic lesion size is unknown. The main goal of this study was to characterize different areas of enhancement in late gadolinium enhancement MRI done immediately after ablation to predict acute edema and chronic lesion size.
Methods and Results In a canine model (n=10), ventricular radiofrequency lesions were created using ThermoCool SmartTouch (Biosense Webster) catheter. All animals underwent MRI (late gadolinium enhancement and T2-weighted edema imaging) immediately after ablation and after 1, 2, 4, and 8 weeks. Edema, microvascular obstruction, and enhanced volumes were identified in MRI and normalized to chronic histological volume. Immediately after contrast administration, the microvascular obstruction region was 3.2±1.1 times larger than the chronic lesion volume in acute MRI. Even 60 minutes after contrast administration, edema was 8.7±3.31 times and the enhanced area 6.14±2.74 times the chronic lesion volume. Exponential fit to the microvascular obstruction volume was found to be the best predictor of chronic lesion volume at 26.14 minutes (95% prediction interval, 24.35–28.11 minutes) after contrast injection. The edema volume in late gadolinium enhancement correlated well with edema volume in T2-weighted MRI with an R2 of 0.99.
Conclusion Microvascular obstruction region on acute late gadolinium enhancement images acquired 26.1 minutes after contrast administration can accurately predict the chronic lesion volume. We also show that T1-weighted MRI images acquired immediately after contrast injection accurately shows edema resulting from radiofrequency ablation.
S. Ghimire, J. Dhamala, J. Coll-Font, J. D. Tate, M. S. Guillem, D. H. Brooks, R. S. MacLeod, L. Wang. Overcoming Barriers to Quantification and Comparison of Electrocardiographic Imaging Methods: A Community-Based Approach, In Computing in Cardiology, Vol. 44, 2017.
There has been a recent upsurge in the development of electrocardiographic imaging (ECGI) methods, along with a significant increase in clinical application. To better assess the state-of-the-art, enable reliable progress, and facilitate clinical adoption, it is important to be able to compare results in a comprehensive manner, scientifically and clinically. However, studies vary in modeling choices, computational methods, validation mechanisms and metrics, and clinical applications, making unified evaluation and comparison of ECGI a critical challenge.
This paper describes initial results of a project to address this challenge via a community-based approach organized by the Consortium for Electrocardiographic Imaging (CEI). We detail different aspects of this collective effort including a data sharing repository, a platform for comparison of different algorithms and modeling approaches on the same datasets, several active workgroups and progress made along these directions. We also summarize the results from groups participating in this collaboration and contributing solutions by applying their methods to the same dataset for comparison.
W. W. Good, B. Erem, J. Coll-Font, D. H. Brooks, R. S. MacLeod. Detecting Ischemic Stress to the Myocardium Using Laplacian Eigenmaps and Changes to Conduction Velocity, In Computing in Cardiology, Vol. 44, IEEE, 2017.
The underlying pathophysiology of ischemia and its electrocardiographic consequences are poorly understood, resulting in unreliable diagnosis of this disease. This limited knowledge of underlying mechanisms suggests a data driven approach, which seeks to identify patterns in the ECG that can be linked statistically to underlying behavior and conditions of ischemic stress. The gold standard ECG metrics for evaluating ischemia monitor vertical deflections within the ST segment. However, ischemia influences all portions of the electrogram. Another metric that targets the QRS complex during ischemia is Conduction Velocity (CV). An even more inclusive, data driven approach is known as "Laplacian Eigenmaps" (LE), which can identify trajectories, or "manifolds", that respond to different spatiotemporal consequences of ischemic stress, and these changes to the trajectories on the manifold may serve as a clinically relevant biomarker. On this study, we compared the LE- and CV-based markers against two gold standards for detecting ischemic stress, both derived from the ST segment. We evaluated the response time and fidelity of each biomarker using a Time to Threshold (TTT) and Contrast Ratio (CR) measure, over 51 episodes recorded as cardiac electrograms from a canine model of controlled ischemia. The results show that metrics designed to monitor regions beyond the ST segment can perform at least as well, if not better, than traditional ST segment based metrics.
M. Kern, A. Lex, N. Gehlenborg, C. R. Johnson.
Interactive Visual Exploration And Refinement Of Cluster Assignments, In BMC Bioinformatics, Cold Spring Harbor Laboratory, April, 2017.
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data.
In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes.
Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.