We develop a novel supervised learning/classification method, called disjunctive normal random forest (DNRF). A DNRF is an ensemble of randomly trained disjunctive normal decision trees (DNDT). To construct a DNDT, we formulate each decision tree in the random forest as a disjunction of rules, which are conjunctions of Boolean functions. We then approximate this disjunction of conjunctions with a differentiable function and approach the learning process as a risk minimization problem that incorporates the classification error into a single global objective function. The minimization problem is solved using gradient descent. DNRFs are able to learn complex decision boundaries and achieve low generalization error. We present experimental results demonstrating the improved performance of DNDTs and DNRFs over conventional decision trees and random forests. We also show the superior performance of DNRFs over state-of-the-art classification methods on benchmark datasets.
The local field potential (LFP) is of growing importance in neurophysiology as a metric of network activity and as a readout signal for use in brain-machine interfaces. However, there are uncertainties regarding the kind and visual field extent of information carried by LFP signals, as well as the specific features of the LFP signal conveying such information, especially under naturalistic conditions. To address these questions, we recorded LFP responses to natural images in V1 of awake and anesthetized macaques using Utah multielectrode arrays. First, we have shown that it is possible to identify presented natural images from the LFP responses they evoke using trained Gabor wavelet (GW) models. Because GW models were devised to explain the spiking responses of V1 cells, this finding suggests that local spiking activity and LFPs (thought to reflect primarily local synaptic activity) carry similar visual information. Second, models trained on scalar metrics, such as the evoked LFP response range, provide robust image identification, supporting the informative nature of even simple LFP features. Third, image identification is robust only for the first 300 ms following image presentation, and image information is not restricted to any of the spectral bands. This suggests that the short-latency broadband LFP response carries most information during natural scene viewing. Finally, best image identification was achieved by GW models incorporating information at the scale of ∼0.5° in size and trained using four different orientations. This suggests that during natural image viewing, LFPs carry stimulus-specific information at spatial scales corresponding to few orientation columns in macaque V1.
SCI Institute. Note: ShapeWorks: An open-source tool for constructing compact statistical point-based models of ensembles of similar shapes that does not rely on specific surface parameterization. Scientific Computing and Imaging Institute (SCI). Download from: http://www.sci.utah.edu/software/shapeworks.html, 2015.
P. Skraba, Bei Wang, G. Chen, P. Rosen. Robustness-Based Simplification of 2D Steady and Unsteady Vector Fields, In IEEE Transactions on Visualization and Computer Graphics (to appear), 2015.
SLASH. Note: SLASH: A hybrid system for high-throughput segmentation of large neuropil datasets, SLASH is funded by the National Institute of Neurological Disorders and Stroke (NINDS) grant 5R01NS075314-03., 2015.
H. Strobelt, B. Alsallakh, J. Botros, B. Peterson, M. Borowsky, H. Pfister,, A. Lex. Vials: Visualizing Alternative Splicing of Genes, In IEEE Transactions on Visualization and Computer Graphics (InfoVis '15), Vol. 22, No. 1, pp. 399-408. 2015.
Alternative splicing is a process by which the same DNA sequence is used to assemble different proteins, called protein isoforms. Alternative splicing works by selectively omitting some of the coding regions (exons) typically associated with a gene. Detection of alternative splicing is difficult and uses a combination of advanced data acquisition methods and statistical inference. Knowledge about the abundance of isoforms is important for understanding both normal processes and diseases and to eventually improve treatment through targeted therapies. The data, however, is complex and current visualizations for isoforms are neither perceptually efficient nor scalable. To remedy this, we developed Vials, a novel visual analysis tool that enables analysts to explore the various datasets that scientists use to make judgments about isoforms: the abundance of reads associated with the coding regions of the gene, evidence for junctions, i.e., edges connecting the coding regions, and predictions of isoform frequencies. Vials is scalable as it allows for the simultaneous analysis of many samples in multiple groups. Our tool thus enables experts to (a) identify patterns of isoform abundance in groups of samples and (b) evaluate the quality of the data. We demonstrate the value of our tool in case studies using publicly available datasets.
B. Summa, A. A. Gooch, G. Scorzelli, V. Pascucci.
Paint and Click: Unified Interactions for Image Boundaries, In Computer Graphics Forum, Vol. 34, No. 2, Wiley-Blackwell, pp. 385--393. May, 2015.
Image boundaries are a fundamental component of many interactive digital photography techniques, enabling applications such as segmentation, panoramas, and seamless image composition. Interactions for image boundaries often rely on two complementary but separate approaches: editing via painting or clicking constraints. In this work, we provide a novel, unified approach for interactive editing of pairwise image boundaries that combines the ease of painting with the direct control of constraints. Rather than a sequential coupling, this new formulation allows full use of both interactions simultaneously, giving users unprecedented flexibility for fast boundary editing. To enable this new approach, we provide technical advancements. In particular, we detail a reformulation of image boundaries as a problem of finding cycles, expanding and correcting limitations of the previous work. Our new formulation provides boundary solutions for painted regions with performance on par with state-of-the-art specialized, paint-only techniques. In addition, we provide instantaneous exploration of the boundary solution space with user constraints. Finally, we provide examples of common graphics applications impacted by our new approach.
M. R. Swanson, J. J. Wolff, J. T. Elison, H. Gu, H. C. Hazlett, K. Botteron, M. Styner, S. Paterson, G. Gerig, J. Constantino, S. Dager, A. Estes, C. Vachet, J. Piven.
Splenium development and early spoken language in human infants, In Developmental Science, Wiley Online Library, 2015.
The association between developmental trajectories of language-related white matter fiber pathways from 6 to 24 months of age and individual differences in language production at 24 months of age was investigated. The splenium of the corpus callosum, a fiber pathway projecting through the posterior hub of the default mode network to occipital visual areas, was examined as well as pathways implicated in language function in the mature brain, including the arcuate fasciculi, uncinate fasciculi, and inferior longitudinal fasciculi. The hypothesis that the development of neural circuitry supporting domain-general orienting skills would relate to later language performance was tested in a large sample of typically developing infants. The present study included 77 infants with diffusion weighted MRI scans at 6, 12 and 24 months and language assessment at 24 months. The rate of change in splenium development varied significantly as a function of language production, such that children with greater change in fractional anisotropy (FA) from 6 to 24 months produced more words at 24 months. Contrary to findings from older children and adults, significant associations between language production and FA in the arcuate, uncinate, or left inferior longitudinal fasciculi were not observed. The current study highlights the importance of tracing brain development trajectories from infancy to fully elucidate emerging brain–behavior associations while also emphasizing the role of the splenium as a key node in the structural network that supports the acquisition of spoken language.
Note: VisTrails: A scientific workflow management system. Scientific Computing and Imaging Institute (SCI), Download from: http://www.vistrails.org, 2015.
R. Whitaker, W. Thompson, J. Berger, B. Fischhof, M. Goodchild, M. Hegarty, C. Jermaine, K. S. McKinley, A. Pang, J. Wendelberger. Workshop on Quantification, Communication, and Interpretation of Uncertainty in Simulation and Data Science, Note: Computing Community Consortium, 2015.
Modern science, technology, and politics are all permeated by data that comes from people, measurements, or computational processes. While this data is often incomplete, corrupt, or lacking in sufficient accuracy and precision, explicit consideration of uncertainty is rarely part of the computational and decision making pipeline. The CCC Workshop on Quantification, Communication, and Interpretation of Uncertainty in Simulation and Data Science explored this problem, identifying significant shortcomings in the ways we currently process, present, and interpret uncertain data. Specific recommendations on a research agenda for the future were made in four areas: uncertainty quantification in large-scale computational simulations, uncertainty quantification in data science, software support for uncertainty computation, and better integration of uncertainty quantification and communication to stakeholders.
J. J. Wolff, G. Gerig, J. D. Lewis, T. Soda, M. A. Styner, C. Vachet, K. N. Botteron, J. T. Elison, S. R. Dager, A. M. Estes, H. C. Hazlett, R. T. Schultz, L. Zwaigenbaum, J. Piven.
Altered corpus callosum morphology associated with autism over the first 2 years of life, In Brain, 2015.
M. Zhang, P. T. Fletcher. Finite-Dimensional Lie Algebras for Fast Diffeomorphic Image Registration, In Information Processing in Medical Imaging (IPMI), 2015.
M. Zhang, P. T. Fletcher. Bayesian Principal Geodesic Analysis for Estimating Intrinsic Diffeomorphic Image Variability, In Medical Image Analysis (accepted), 2015.
M. Zhang, H. Shao, P. T. Fletcher. A Mixture Model for Automatic Diffeomorphic Multi-Atlas Building, In MICCAI Workshop, Springer, 2015.
Computing image atlases that are representative of a dataset
is an important first step for statistical analysis of images. Most current approaches estimate a single atlas to represent the average of a large population of images, however, a single atlas is not sufficiently expressive to capture distributions of images with multiple modes. In this paper, we present a mixture model for building diffeomorphic multi-atlases that can represent sub-populations without knowing the category of each observed data point. In our probabilistic model, we treat diffeomorphic image transformations as latent variables, and integrate them out using a Monte Carlo Expectation Maximization (MCEM) algorithm via Hamiltonian Monte Carlo (HMC) sampling. A key benefit of our model is that the mixture modeling inference procedure results in an automatic clustering of the dataset. Using 2D synthetic data generated from known parameters, we demonstrate the ability of our model to successfully recover the multi-atlas and automatically cluster the dataset. We also show the effectiveness of the proposed method in a multi-atlas estimation problem for 3D brain images.
G. Adluru, Y. Gur, J. Anderson, L. Richards, N. Adluru, E. DiBella. Assessment of white matter microstructure in stroke patients using NODDI, In Proceedings of the 2014 IEEE Int. Conf. Engineering and Biology Society (EMBC), 2014.
Diffusion weighted imaging (DWI) is widely used to study changes in white matter following stroke. In various studies employing diffusion tensor imaging (DTI) and high angular resolution diffusion imaging (HARDI) modalities, it has been shown that fractional anisotropy (FA), mean diffusivity (MD), and generalized FA (GFA) can be used as measures of white matter tract integrity in stroke patients. However, these measures may be non-specific, as they do not directly delineate changes in tissue microstructure. Multi-compartment models overcome this limitation by modeling DWI data using a set of indices that are directly related to white matter microstructure. One of these models which is gaining popularity, is neurite orientation dispersion and density imaging (NODDI). This model uses conventional single or multi-shell HARDI data to describe fiber orientation dispersion as well as densities of different tissue types in the imaging voxel. In this paper, we apply for the first time the NODDI model to 4-shell HARDI stroke data. By computing NODDI indices over the entire brain in two stroke patients, and comparing tissue regions in ipsilesional and contralesional hemispheres, we demonstrate that NODDI modeling provides specific information on tissue microstructural changes. We also introduce an information theoretic analysis framework to investigate the non-local effects of stroke in the white matter. Our initial results suggest that the NODDI indices might be more specific markers of white matter reorganization following stroke than other measures previously used in studies of stroke recovery.
S.P. Awate, R.T. Whitaker.
Multiatlas Segmentation as Nonparametric Regression, In IEEE Trans Med Imaging, April, 2014.
PubMed ID: 24802528
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and labelfusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
S.P. Awate, Y.-Y. Yu, R.T. Whitaker. Kernel Principal Geodesic Analysis, In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Springer LNAI, 2014.
Kernel principal component analysis (kPCA) has been proposed as a dimensionality-reduction technique that achieves nonlinear, low-dimensional representations of data via the mapping to kernel feature space. Conventionally, kPCA relies on Euclidean statistics in kernel feature space. However, Euclidean analysis can make kPCA inefficient or incorrect for many popular kernels that map input points to a hypersphere in kernel feature space. To address this problem, this paper proposes a novel adaptation of kPCA, namely kernel principal geodesic analysis (kPGA), for hyperspherical statistical analysis in kernel feature space. This paper proposes tools for statistical analyses on the Riemannian manifold of the Hilbert sphere in the reproducing kernel Hilbert space, including algorithms for computing the sample weighted Karcher mean and eigen analysis of the sample weighted Karcher covariance. It then applies these tools to propose novel methods for (i)~dimensionality reduction and (ii)~clustering using mixture-model fitting. The results, on simulated and real-world data, show that kPGA-based methods perform favorably relative to their kPCA-based analogs.
H. Bhatia, V. Pascucci, R.M. Kirby, P.-T. Bremer.
Extracting Features from Time-Dependent Vector Fields Using Internal Reference Frames, In Computer Graphics Forum, Vol. 33, No. 3, pp. 21--30. June, 2014.