research

Computational Biology


In the Genomic Signal Processing Lab at the University of Utah, we develop generalizations of the matrix and tensor computations that underlie theoretical physics, and use them to create models that compare and integrate different types of large-scale molecular biological data, such as DNA microarray data, and computationally predict global mechanisms that govern the activity of DNA and RNA. We believe that future discovery and control in biology and medicine will come from the mathematical modeling of such large-scale molecular biological data data, just as Kepler discovered the laws of planetary motion by using mathematics to describe trends in astronomical data. We pioneered the use of the matrix singular value decomposition (SVD), the tensor higher-order SVD (HOSVD) and their generalizations in modeling different types of genomic data from different studies of cell division and cancer and from different organisms. Our recent experimental results verify our computational prediction of a mechanism of regulation that correlates DNA replication origin activity with mRNA expression, demonstrating for the first time that mathematical modeling of DNA microarray data, in which the mathematical variables and operations represent biological reality, can be used, beyond classification of genes and cellular samples, to correctly predict previously unknown global biological mechanisms. We now extend our recent computational results, modeling data from the Cancer Genome Atlas, to formulate and implement a protocol for the utilization of recent global profiling biotechnologies in the computational prognosis of cancers. Ultimately, our work will bring physicians a step closer to one day being able to predict and control the progression of cancers as readily as NASA engineers plot the trajectories of spacecraft today.

Genomic Signal Processing Lab

Computational Biology


 
30 Generation of Cloned Transgenic Goats with Cardiac Specific Overexpression of Transforming Growth Factor β1
Q. Meng, J. Hall, H. Rutigliano, X. Zhou, B.R. Sessions, R. Stott, K. Panter, C.J. Davies, R. Ranjan, D. Dosdall, R.S. MacLeod, N. Marrouche, K.L. White, Z. Wang, I.A. Polejaeva. In Reproduction, Fertility and Development, Vol. 25, No. 1, pp. 162--163. 2012.
DOI: 10.1071/RDv25n1Ab30

Transforming growth factor β1 (TGF-β1) has a potent profibrotic function and is central to signaling cascades involved in interstitial fibrosis, which plays a critical role in the pathobiology of cardiomyopathy and contributes to diastolic and systolic dysfunction. In addition, fibrotic remodeling is responsible for generation of re-entry circuits that promote arrhythmias (Bujak and Frangogiannis 2007 Cardiovasc. Res. 74, 184–195). Due to the small size of the heart, functional electrophysiology of transgenic mice is problematic. Large transgenic animal models have the potential to offer insights into conduction heterogeneity associated with fibrosis and the role of fibrosis in cardiovascular diseases. The goal of this study was to generate transgenic goats overexpressing an active form of TGFβ-1 under control of the cardiac-specific α-myosin heavy chain promoter (α-MHC). A pcDNA3.1DV5-MHC-TGF-β1cys33ser vector was constructed by subcloning the MHC-TGF-β1 fragment from the plasmid pUC-BM20-MHC-TGF-β1 (Nakajima et al. 2000 Circ. Res. 86, 571–579) into the pcDNA3.1D V5 vector. The Neon transfection system was used to electroporate primary goat fetal fibroblasts. After G418 selection and PCR screening, transgenic cells were used for SCNT. Oocytes were collected by slicing ovaries from an abattoir and matured in vitro in an incubator with 5% CO2 in air. Cumulus cells were removed at 21 to 23 h post-maturation. Oocytes were enucleated by aspirating the first polar body and nearby cytoplasm by micromanipulation in Hepes-buffered SOF medium with 10 µg of cytochalasin B mL–1. Transgenic somatic cells were individually inserted into the perivitelline space and fused with enucleated oocytes using double electrical pulses of 1.8 kV cm–1 (40 µs each). Reconstructed embryos were activated by ionomycin (5 min) and DMAP and cycloheximide (CHX) treatments. Cloned embryos were cultured in G1 medium for 12 to 60 h in vitro and then transferred into synchronized recipient females. Pregnancy was examined by ultrasonography on day 30 post-transfer. A total of 246 cloned embryos were transferred into 14 recipients that resulted in production of 7 kids. The pregnancy rate was higher in the group cultured for 12 h compared with those cultured 36 to 60 h [44.4% (n = 9) v. 20% (n = 5)]. The kidding rates per embryo transferred of these 2 groups were 3.8% (n = 156) and 1.1% (n = 90), respectively. The PCR results confirmed that all the clones were transgenic. Phenotype characterization [e.g. gene expression, electrocardiogram (ECG), and magnetic resonance imaging (MRI)] is underway. We demonstrated successful production of transgenic goat via SCNT. To our knowledge, this is the first transgenic goat model produced for cardiovascular research.




GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival
C.H. Lee, B.O. Alpert, P. Sankaranarayanan, O. Alter. In PLoS ONE, Vol. 7, No. 1, Public Library of Science, pp. e30098. 2012.
DOI: 10.1371/journal.pone.0030098

Despite recent large-scale profiling efforts, the best prognostic predictor of glioblastoma multiforme (GBM) remains the patient's age at diagnosis. We describe a global pattern of tumor-exclusive co-occurring copy-number alterations (CNAs) that is correlated, possibly coordinated with GBM patients' survival and response to chemotherapy. The pattern is revealed by GSVD comparison of patient-matched but probe-independent GBM and normal aCGH datasets from The Cancer Genome Atlas (TCGA). We find that, first, the GSVD, formulated as a framework for comparatively modeling two composite datasets, removes from the pattern copy-number variations (CNVs) that occur in the normal human genome (e.g., female-specific X chromosome amplification) and experimental variations (e.g., in tissue batch, genomic center, hybridization date and scanner), without a-priori knowledge of these variations. Second, the pattern includes most known GBM-associated changes in chromosome numbers and focal CNAs, as well as several previously unreported CNAs in greater than 3% of the patients. These include the biochemically putative drug target, cell cycle-regulated serine/threonine kinase-encoding TLK2, the cyclin E1-encoding CCNE1, and the Rb-binding histone demethylase-encoding KDM5A. Third, the pattern provides a better prognostic predictor than the chromosome numbers or any one focal CNA that it identifies, suggesting that the GBM survival phenotype is an outcome of its global genotype. The pattern is independent of age, and combined with age, makes a better predictor than age alone. GSVD comparison of matched profiles of a larger set of TCGA patients, inclusive of the initial set, confirms the global pattern. GSVD classification of the GBM profiles of an independent set of patients validates the prognostic contribution of the pattern.




A Conservered Developmental Patterning Network Produces Quantitatively Different Output in Multiple Species of Drosophila
C. Fowlkes, K. Eckenrode, M. Bragdon, M.D. Meyer, Z. Wunderlich, L. Simirenko, C. Luengo, S. Keranen, C. Henriquez, D. Knowles, M. Biggin, M. Eisen, A. DePace. In PLoS Genetics, Vol. 7, No. 10:e1002346, pp. 17 pages. October, 2011.

Differences in the level, timing, or location of gene expression can contribute to alternative phenotypes at the molecular and organismal level. Understanding the origins of expression differences is complicated by the fact that organismal morphology and gene regulatory networks could potentially vary even between closely related species. To assess the scope of such changes, we used high-resolution imaging methods to measure mRNA expression in blastoderm embryos of Drosophila yakuba and Drosophila pseudoobscura and assembled these data into cellular resolution atlases, where expression levels for 13 genes in the segmentation network are averaged into species-specific, cellular resolution morphological frameworks. We demonstrate that the blastoderm embryos of these species differ in their morphology in terms of size, shape, and number of nuclei. We present an approach to compare cellular gene expression patterns between species, while accounting for varying embryo morphology, and apply it to our data and an equivalent dataset for Drosophila melanogaster. Our analysis reveals that all individual genes differ quantitatively in their spatio-temporal expression patterns between these species, primarily in terms of their relative position and dynamics. Despite many small quantitative differences, cellular gene expression profiles for the whole set of genes examined are largely similar. This suggests that cell types at this stage of development are conserved, though they can differ in their relative position by up to 3-4 cell widths and in their relative proportion between species by as much as 5-fold. Quantitative differences in the dynamics and relative level of a subset of genes between corresponding cell types may reflect altered regulatory functions between species. Our results emphasize that transcriptional networks can diverge over short evolutionary timescales and that even small changes can lead to distinct output in terms of the placement and number of equivalent cells.




Tensor Decomposition Reveals Concurrent Evolutionary Convergences and Divergences and Correlations with Structural Motifs in Ribosomal RNA
C. Muralidhara, A.M. Gross, R.R. Gutell, O. Alter. In PLoS ONE, Vol. 6, No. 4, Public Library of Science, pp. e18768. April, 2011.
DOI: 10.1371/journal.pone.0018768

Evolutionary relationships among organisms are commonly described by using a hierarchy derived from comparisons of ribosomal RNA (rRNA) sequences. We propose that even on the level of a single rRNA molecule, an organism's evolution is composed of multiple pathways due to concurrent forces that act independently upon different rRNA degrees of freedom. Relationships among organisms are then compositions of coexisting pathway-dependent similarities and dissimilarities, which cannot be described by a single hierarchy. We computationally test this hypothesis in comparative analyses of 16S and 23S rRNA sequence alignments by using a tensor decomposition, i.e., a framework for modeling composite data. Each alignment is encoded in a cuboid, i.e., a third-order tensor, where nucleotides, positions and organisms, each represent a degree of freedom. A tensor mode-1 higher-order singular value decomposition (HOSVD) is formulated such that it separates each cuboid into combinations of patterns of nucleotide frequency variation across organisms and positions, i.e., \"eigenpositions\" and corresponding nucleotide-specific segments of \"eigenorganisms,\" respectively, independent of a-priori knowledge of the taxonomic groups or rRNA structures. We find, in support of our hypothesis that, first, the significant eigenpositions reveal multiple similarities and dissimilarities among the taxonomic groups. Second, the corresponding eigenorganisms identify insertions or deletions of nucleotides exclusively conserved within the corresponding groups, that map out entire substructures and are enriched in adenosines, unpaired in the rRNA secondary structure, that participate in tertiary structure interactions. This demonstrates that structural motifs involved in rRNA folding and function are evolutionary degrees of freedom. Third, two previously unknown coexisting subgenic relationships between Microsporidia and Archaea are revealed in both the 16S and 23S rRNA alignments, a convergence and a divergence, conferred by insertions and deletions of these motifs, which cannot be described by a single hierarchy. This shows that mode-1 HOSVD modeling of rRNA alignments might be used to computationally predict evolutionary mechanisms.



 
Global Effects of DNA Replication and DNA Replication Origin Activity on Eukaryotic Gene Expression
L. Omberg, J.R. Meyerson, K. Kobayashi, L.S. Drury, J.F.X. Diffley, O. Alter. In Nature Molecular Systems Biology, Vol. 5, No. 312, pp. (published online). October, 2009.
DOI: 10.1038/msb.2009.70

 
A Tensor Higher-Order Singular Value Decomposition for Integrative Analysis of DNA Microarray Data From Different Studies
L. Omberg, G.H. Golub, O. Alter. In Proceedings of the National Academy of Sciences, Vol. 104, No. 47, pp. 18371–-18376. November, 2007.
DOI: 10.1073/pnas.0709146104

 
Genomic Signal Processing: From Matrix Algebra to Genetic Networks
O. Alter. In Microarray Data Analysis: Methods in Molecular Biology, Vol. 377, Edited by M.J. Korenberg, Humana Press, Totowa, pp. 17--59. 2007.
DOI: 10.1007/978-1-59745-390-5_2

 
Discovery of Principles of Nature from Mathematical Modeling of DNA Microarray Data
O. Alter. In Proceedings of the National Academy of Sciences, Vol. 103, No. 44, pp. 16063--16064. October, 2006.
DOI: 10.1073/pnas.0607650103

 
Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening
O. Alter, G. H. Golub. In Proceedings of the National Academy of Sciences, Vol. 103, No. 32, pp. 11828--11833. August, 2006.
DOI: 10.1073/pnas.0604756103

 
Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations
O. Alter, G.H. Golub. In Proceedings of the National Academy of Sciences, Vol. 102, No. 49, pp. 17559-–17564. December, 2005.
DOI: 10.1073/pnas.0509033102