|
Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA |
|
 Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA Chaitanya Muralidhara, Andrew M. Gross, Robin R. Gutell, Orly Alter. PLoS ONE, vol. 6, No. 4, pp. e18768, April 2011.
Evolutionary relationships among organisms are commonly described by using a hierarchy derived from comparisons of ribosomal RNA (rRNA) sequences. We propose that even on the level of a single rRNA molecule, an organism’s evolution is composed of multiple pathways due to concurrent forces that act independently upon different rRNA degrees of freedom. Relationships among organisms are then compositions of coexisting pathway-dependent similarities and dissimilarities, which cannot be described by a single hierarchy. We computationally test this hypothesis in comparative analyses of 16S and 23S rRNA sequence alignments by using a tensor decomposition, i.e., a framework for modeling composite data. Each alignment is encoded in a cuboid, i.e., a third-order tensor, where nucleotides, positions and organisms, each represent a degree of freedom. A tensor mode-1 higher-order singular value decomposition (HOSVD) is formulated such that it separates each cuboid into combinations of patterns of nucleotide frequency variation across organisms and positions, i.e., ‘‘eigenpositions’’ and corresponding nucleotide-specific segments of ‘‘eigenorganisms,’’ respectively, independent of a-priori knowledge of the taxonomic groups or rRNA structures. We find, in support of our hypothesis that, first, the significant eigenpositions reveal multiple similarities and dissimilarities among the taxonomic groups. Second, the corresponding eigenorganisms identify insertions or deletions of nucleotides exclusively conserved within the corresponding groups, that map out entire substructures and are enriched in adenosines, unpaired in the rRNA secondary structure, that participate in tertiary structure interactions. This demonstrates that structural motifs involved in rRNA folding and function are evolutionary degrees of freedom. Third, two previously unknown coexisting subgenic relationships between Microsporidia and Archaea are revealed in both the 16S and 23S rRNA alignments, a convergence and a divergence, conferred by insertions and deletions of these motifs, which cannot be described by a single hierarchy. This shows that mode- 1 HOSVD modeling of rRNA alignments might be used to computationally predict evolutionary mechanisms.
Full Publication Appendix |
|
|
 Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression L. Omberg, J.R. Meyerson, K. Kobayashi, L.S. Drury, J.F.X. Diffley, O. Alter. In Nature Molecular Systems Biology, Vol. 5, No. 312, pp. (published online). October, 2009.
This report provides a global view of how gene expression is affected by DNA replication. We analyzed synchronized cultures of Saccharomyces cerevisiae under conditions that prevent DNA replication initiation without delaying cell cycle progression.We use a higher-order singular value decomposition to integrate the global mRNA expression measured in the multiple time courses, detect and remove experimental artifacts and identify significant combinations of patterns of expression variation across the genes, time points and conditions. We find that, first, B88% of the global mRNA expression is independent of DNA replication. Second, the requirement of DNA replication for efficient histone gene expression is independent of conditions that elicit DNA damage checkpoint responses. Third, origin licensing decreases the expression of genes with origins near their 30 ends, revealing that downstream origins can regulate the expression of upstream genes. This confirms previous predictions from mathematical modeling of a global causal coordination between DNA replication origin activity and mRNA expression, and shows that mathematical modeling of DNA microarray data can be used to correctly predict previously unknown biological modes of regulation.
Full Publication Supplement |
|
|
 A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data From Different Studies L. Omberg, G.H. Golub, O. Alter. In Proceedings of the National Academy of Sciences, Vol. 104, No. 47, pp. 18371–-18376. November, 2007.
We describe the use of a higher-order singular value decomposition (HOSVD) in transforming a data tensor of genes X ‘‘x-settings,’’ that is, different settings of the experimental variable x X ‘‘y-settings,’’ which tabulates DNA microarray data from different studies, to a ‘‘core tensor’’ of ‘‘eigenarrays’’ X ‘‘x-eigengenes’’ X ‘‘y-eigengenes.’’ Reformulating this multilinear HOSVD such that it decomposes the data tensor into a linear superposition of all outer products of an eigenarray, an x- and a y-eigengene, that is, rank-1 ‘‘subtensors,’’ we define the significance of each subtensor in terms of the fraction of the overall information in the data tensor that it captures. We illustrate this HOSVD with an integration of genome-scale mRNA expression data from three yeast cell cycle time courses, two of which are under exposure to either hydrogen peroxide or menadione. We find that significant subtensors represent independent biological programs or experimental phenomena. The picture that emerges suggests that the conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway that these genes are involved in, may play significant, yet previously unrecognized, roles in the differential effects of hydrogen peroxide and menadione on cell cycle progression. A genome-scale correlation between DNA replication initiation and RNA transcription, which is equivalent to a recently discovered correlation and might be due to a previously unknown mechanism of regulation, is independently uncovered.
Full Publication |
|
|
 Discovery of Principles of Nature from Mathematical Modeling of DNA Microarray Data O. Alter. In Proceedings of the National Academy of Sciences, Vol. 103, No. 44, pp. 16063--16064. October, 2006.
Recent advances in DNA microarray hybridization technology make it possible to record the molecular biological signals, e.g., mRNA expression levels and proteins’ DNA-binding occupancy levels, that guide the progression of cellular processes on genomic scales. Biology and medicine today may be at a point similar to where physics was after the advent of the telescope.
Full Publication |
|
|
 Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening O. Alter, G. H. Golub. In Proceedings of the National Academy of Sciences, Vol. 103, No. 32, pp. 11828--11833. August, 2006.
We describe the singular value decomposition (SVD) of yeast genome-scale mRNA lengths distribution data measured by DNA microarrays. SVD uncovers in the mRNA abundance levels data matrix of genesarrays, i.e., electrophoretic gel migration lengths or mRNA lengths, mathematically unique decorrelated and decoupled ‘‘eigengenes.’’ The eigengenes are the eigenvectors of the arraysarrays correlation matrix, with the corresponding series of eigenvalues proportional to the series of the ‘‘fractions of eigen abundance.’’ Each fraction of eigen abundance indicates the significance of the corresponding eigengene relative to all others. We show that the eigengenes fit ‘‘asymmetric Hermite functions,’’ a generalization of the eigenfunctions of the quantum harmonic oscillator and the integral transform which kernel is a generalized coherent state. The fractions of eigen abundance fit a geometric series as do the eigenvalues of the integral transform which kernel is a generalized coherent state. The ‘‘asymmetric generalized coherent state’’ models the measured data, where the profiles of mRNA abundance levels of most genes as well as the distribution of the peaks of these profiles fit asymmetric Gaussians. We hypothesize that the asymmetry in the distribution of the peaks of the profiles is due to two competing evolutionary forces. We show that the asymmetry in the profiles of the genes might be due to a previously unknown asymmetry in the gel electrophoresis thermal broadening of a moving, rather than a stationary, band of RNA molecules.
Full Publication |
|
|
 Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations O. Alter, G.H. Golub. In Proceedings of the National Academy of Sciences, Vol. 102, No. 49, pp. 17559-–17564. December, 2005.
We describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways that compose a cellular system from genome-scale nondirectional networks of correlations among the genes of the system. The EVD formulates a genes x genes network as a linear superposition of genes x genes decorrelated and decoupled rank-1 subnetworks, which can be associated with functionally independent pathways. The integrative pseudoinverse projection of a network computed from a ‘‘data’’ signal onto a designated ‘‘basis’’ signal approximates the network as a linear superposition of only the subnetworks that are common to both signals and simulates observation of only the pathways that are manifest in both experiments. We define a comparative HOEVD that formulates a series of networks as linear superpositions of decorrelated rank-1 subnetworks and the rank-2 couplings among these subnetworks, which can be associated with independent pathways and the transitions among them common to all networks in the series or exclusive to a subset of the networks. Boolean functions of the discretized subnetworks and couplings highlight differential, i.e., pathway-dependent, relations among genes. We illustrate the EVD, pseudoinverse projection, and HOEVD of genome-scale networks with analyses of yeast DNA microarray data.
Full Publication Appendix |
|
|
 Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation Between DNA Replication and RNA Transcription O. Alter, G.H. Golub. In Proceedings of the National Academy of Sciences, Vol. 101, No. 47, pp. 16577–-16582. November, 2004.
We describe an integrative data-driven mathematical framework that formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ‘‘basis’’ set. By using pseudoinverse projection, the molecular biological profiles of the data samples are least-squares-approximated as superpositions of the basis profiles. Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis and gives a global picture of the correlations and possibly also causal coordination of these two sets of states. We illustrate this framework with an integration of yeast genome-scale proteins’ DNA-binding data with cell cycle mRNA expression time course data. Novel correlation between DNA replication initiation and RNA transcription during the yeast cell cycle, which might be due to a previously unknown mechanism of regulation, is predicted.
Full Publication |
|
|
 Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms O. Alter, P.O. Brown, D. Botstein. In Proceedings of the National Academy of Sciences, Vol. 100, No. 6, pp. 3351--3356. March, 2003.
We describe a comparative mathematical framework for two genome-scale expression data sets. This framework formulates expression as superposition of the effects of regulatory programs, biological processes, and experimental artifacts common to both data sets, as well as those that are exclusive to one data set or the other, by using generalized singular value decomposition. This framework enables comparative reconstruction and classification of the genes and arrays of both data sets. We illustrate this framework with a comparison of yeast and human cell-cycle expression data sets.
Full Publication |
|

|
 Processing and Modeling Genome-Wide Expression Data Using Singular Value Decomposition O. Alter, P.O. Brown, D. Botstein. In Microarrays: Optical Technologies and Informatics, Vol. 4266, Edited by M. L. Bittner, Y. Chen, A. N. Dorsel and E. R. Dougherty (International Society for Optical Engineering, Bellingham), pp. 171--186. 2001.
We describe the use of singular value decomposition in transforming genome-wide expression data from genes × arrays space to reduced diagonalized “eigengenes” × “eigenarrays” space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays).Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent additive or multiplicative noise, experimental artifacts, or even irrelevant biological processes enables meaningful comparison of the expression of different genes across different arrays in different experiments.Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively.After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.
Full Publication Appendix |
|
|
 Singular value decomposition for genome-wide expression data processing and modeling Orly Alter, Patrick O. Brown, and David Botstein, In Proceedings of the National Academy of Sciences, Vol. 97, No. 18, pp. 10101--10106. August, 2000.
We describe the use of singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized ‘‘eigengenes’’ x ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.
Full Publication |
|
|
|