Catalogue Search | MBRL

JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES

by Nobel, Andrew B. , Hoadley, Katherine A. , Lock, Eric F. in Correlations , data fusion , Data integration

2013

Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such data sets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data and provides new directions for the visual exploration of joint and individual structures. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene—miRNA associations and provides better characterization of tumor types. Data and software are available at https://genome.unc.edu/jive/.

Journal Article

Share this book

Add to My Shelf

Identification of shared and disease-specific host gene–microbiome associations across human diseases using multi-omic integration

by Adamowicz, Beth , Burns, Michael B. , Mars, Ruben A. T. in 45/91 , 631/114/1305 , 631/208/199

2022

While gut microbiome and host gene regulation independently contribute to gastrointestinal disorders, it is unclear how the two may interact to influence host pathophysiology. Here we developed a machine learning-based framework to jointly analyse paired host transcriptomic ( n = 208) and gut microbiome ( n = 208) profiles from colonic mucosal samples of patients with colorectal cancer, inflammatory bowel disease and irritable bowel syndrome. We identified associations between gut microbes and host genes that depict shared as well as disease-specific patterns. We found that a common set of host genes and pathways implicated in gastrointestinal inflammation, gut barrier protection and energy metabolism are associated with disease-specific gut microbes. Additionally, we also found that mucosal gut microbes that have been implicated in all three diseases, such as Streptococcus , are associated with different host pathways in each disease, suggesting that similar microbes can affect host pathophysiology in a disease-specific manner through regulation of different host genes. Our framework can be applied to other diseases for the identification of host gene–microbiome associations that may influence disease outcomes. A machine learning framework for integrating multi-omic high-dimensional datasets identified disease-specific and shared host gene–microbiome associations across three gastrointestinal diseases.

Journal Article

Share this book

Add to My Shelf

A hierarchical spike-and-slab model for pan-cancer survival using pan-omic data

by Samorodnitsky, Sarah , Lock, Eric F. , Hoadley, Katherine A. in Algorithms , Analysis , Bayes Theorem

2022

Background Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer. However, such analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict clinical outcomes. We address the issue of prediction across multiple high-dimensional sources of data and sample sets by using molecular patterns identified by BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a Bayesian hierarchical model. Our model performs variable selection through spike-and-slab priors that borrow information across clustered data. We use this model to predict overall patient survival from the Cancer Genome Atlas with data from 29 cancer types and 4 omics sources and use simulations to characterize the performance of the hierarchical spike-and-slab prior. Results We found that molecular patterns shared across all or most cancers were largely not predictive of survival. However, our model selected patterns unique to subsets of cancers that differentiate clinical tumor subtypes with markedly different survival outcomes. Some of these subtypes were previously established, such as subtypes of uterine corpus endometrial carcinoma, while others may be novel, such as subtypes within a set of kidney carcinomas. Through simulations, we found that the hierarchical spike-and-slab prior performs best in terms of variable selection accuracy and predictive power when borrowing information is advantageous, but also offers competitive performance when it is not. Conclusions We address the issue of prediction across multiple sources of data by using results from BIDIFAC+ in a Bayesian hierarchical model for overall patient survival. By incorporating spike-and-slab priors that borrow information across cancers, we identified molecular patterns that distinguish clinical tumor subtypes within a single cancer and within a group of cancers. We also corroborate the flexibility and performance of using spike-and-slab priors as a Bayesian variable selection approach.

Journal Article

Share this book

Add to My Shelf

Tensor-on-Tensor Regression

by Lock, Eric F. in Dimension Reduction , Multiway data , PARAFAC/CANDECOMP

2018

I propose a framework for the linear prediction of a multiway array (i.e., a tensor) from another multiway array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. I describe an approach that exploits the multiway structure of both the predictors and the outcomes by restricting the coefficients to have reduced PARAFAC/CANDECOMP rank. I propose a general and efficient algorithm for penalized least-squares estimation, which allows for a ridge (L 2 ) penalty on the coefficients. The objective is shown to give the mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for inference. I illustrate the approach with an application to facial image data. An R package is available at https://github.com/lockEF/MultiwayRegression .

Journal Article

Share this book

Add to My Shelf

Human cytomegalovirus in breast milk is associated with milk composition and the infant gut microbiome and growth

by Heisel, Timothy , Isganaitis, Elvira , Lock, Eric F. in 38/91 , 45/23 , 631/250/255/2514

2024

Human cytomegalovirus (CMV) is a highly prevalent herpesvirus that is often transmitted to the neonate via breast milk. Postnatal CMV transmission can have negative health consequences for preterm and immunocompromised infants, but any effects on healthy term infants are thought to be benign. Furthermore, the impact of CMV on the composition of the hundreds of bioactive factors in human milk has not been tested. Here, we utilize a cohort of exclusively breastfeeding full-term mother-infant pairs to test for differences in the milk transcriptome and metabolome associated with CMV, and the impact of CMV in breast milk on the infant gut microbiome and infant growth. We find upregulation of the indoleamine 2,3-dioxygenase (IDO) tryptophan-to-kynurenine metabolic pathway in CMV+ milk samples, and that CMV+ milk is associated with decreased Bifidobacterium in the infant gut. Our data indicate two opposing CMV-associated effects on infant growth; with kynurenine positively correlated, and CMV viral load negatively correlated, with infant weight-for-length at 1 month of age. These results suggest CMV transmission, CMV-related changes in milk composition, or both may be modulators of full-term infant development. Cytomegalovirus (CMV) is often transmitted to infants through breast milk. Here, in a cohort of exclusively breastfeeding full-term mother-infant pairs, the authors identify changes in milk composition, infant growth, and the infant gut microbiome associated with the presence of CMV in milk.

Journal Article

Share this book

Add to My Shelf

Empirical Bayes linked matrix decomposition

by Lock, Eric F. in Algorithms , Artificial Intelligence , Bayesian analysis

2024

Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular “omics” technologies may capture different feature sets (e.g., corresponding to rows in a matrix) and/or different sample populations (corresponding to columns). This has motivated a large body of work on integrative matrix factorization approaches that identify and decompose low-dimensional signal that is shared across multiple matrices or specific to a given matrix. We propose an empirical variational Bayesian approach to this problem that has several advantages over existing techniques, including the flexibility to accommodate shared signal over any number of row or column sets (i.e., bidimensional integration), an intuitive model-based objective function that yields appropriate shrinkage for the inferred signals, and a relatively efficient estimation algorithm with no tuning parameters. A general result establishes conditions for the uniqueness of the underlying decomposition for a broad family of methods that includes the proposed approach. For scenarios with missing data, we describe an associated iterative imputation approach that is novel for the single-matrix context and a powerful approach for “blockwise” imputation (in which an entire row or column is missing) in various linked matrix contexts. Extensive simulations show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal, accurately decomposing shared and specific signals, and accurately imputing missing data. The approach is applied to gene expression and miRNA data from breast cancer tissue and normal breast tissue, for which it gives an informative decomposition of variation and outperforms alternative strategies for missing data imputation.

Journal Article

Share this book

Add to My Shelf

Novel approach to exploring protease activity and targets in HIV-associated obstructive lung disease using combined proteomic-peptidomic analysis

by Leung, Janice M. , Kruk, Monica , Weise, Danielle in Adult , Aptamers , Binding sites

2024

Background Obstructive lung disease (OLD) is increasingly prevalent among persons living with HIV (PLWH). However, the role of proteases in HIV-associated OLD remains unclear. Methods We combined proteomics and peptidomics to comprehensively characterize protease activities. We combined mass spectrometry (MS) analysis on bronchoalveolar lavage fluid (BALF) peptides and proteins from PLWH with OLD (n = 25) and without OLD (n = 26) with a targeted Somascan aptamer-based proteomic approach to quantify individual proteases and assess their correlation with lung function. Endogenous peptidomics mapped peptides to native proteins to identify substrates of protease activity. Using the MEROPS database, we identified candidate proteases linked to peptide generation based on binding site affinities which were assessed via z-scores. We used t-tests to compare average forced expiratory volume in 1 s per predicted value (FEV1pp) between samples with and without detection of each cleaved protein and adjusted for multiple comparisons by controlling the false discovery rate (FDR). Findings We identified 101 proteases, of which 95 had functional network associations and 22 correlated with FEV1pp. These included cathepsins, metalloproteinases (MMP), caspases and neutrophil elastase. We discovered 31 proteins subject to proteolytic cleavage that associate with FEV1pp, with the top pathways involved in small ubiquitin-like modifier mediated modification (SUMOylation). Proteases linked to protein cleavage included neutrophil elastase, granzyme, and cathepsin D. Interpretations In HIV-associated OLD, a significant number of proteases are up-regulated, many of which are involved in protein degradation. These proteases degrade proteins involved in cell cycle and protein stability, thereby disrupting critical biological functions.

Journal Article

Share this book

Add to My Shelf

Prediction With Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

by Adam Kaplan , Eric F Lock in Methodology

2017

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal component analysis (PCA). However, the application of PCA is not straightforward for multisource data, wherein multiple sources of ‘omics data measure different but related biological components. In this article, we use recent advances in the dimension reduction of multisource data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multisource data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example, we consider predicting survival for patients with glioblastoma multiforme from 3 data sources measuring messenger RNA expression, microRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function jive.predict.

Journal Article

Share this book

Add to My Shelf

Comprehensive proteomic classifier for molecular characterisation of pulmonary sarcoidosis: protocol for a longitudinal multi-centre study to evaluate bronchoalveolar fluid and cell diagnostic and prognostic biomarkers of pulmonary sarcoidosis

by Dincer, Erhan H , Li, Li , Fingerlin, Tasha E in Bioinformatics , Biomarkers , Biomarkers - analysis

2026

IntroductionSarcoidosis is a multisystem disorder with variable presentation and disease course. Diagnosis requires the exclusion of other causes of granulomatous inflammation. Current clinical management is often fraught with diagnostic uncertainy and the lack of tools to predict pulmonary disease progression. To address these challenges, we designed a study using data from bronchoalveolar lavage (BAL) fluid and cells to develop diagnostic and prognostic tools in patients with pulmonary sarcoidosis.Methods and AnalysisThis multicentre study will include discovery and validation cohorts of healthy controls, interstitial lung disease controls and pulmonary sarcoidosis cases from three study sites. Sarcoidosis participants will be grouped into progressive and non-progressive pulmonary disease based on changes in pulmonary function testing, chest radiographs or treatment requirements. The discovery cohort consists of participants with existing BAL fluid, BAL cells, and clinical datasets; the validation cohort will be prospectively enrolled and participants will consent for BAL collection from either a clinical or research bronchoscopy. Untargeted proteomic profiling of BALF along with statistical modelling with variable selection techniques will generate a classifier for diagnosis and prognosis. Targeted proteomics using parallel reaction monitoring–mass spectrometry will be used for internal and external validation. Additionally, BAL cell single-cell gene-expression analysis using Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) will be integrated with proteome-wide data to elucidate cell-specific pathways implicated in the development and progression of sarcoidosis.Ethics and DisseminationThe study will be conducted in accordance with Good Clinical Practice and the Declaration of Helsinki. The protocol has been approved by the Biomedical Research Alliance of New York Institutional Review Board (IRB), which serves as the single IRB across all study sites. The findings of this study will be presented as abstracts at scientific meetings and summarised in peer-reviewed journal manuscripts.

Journal Article

Share this book

Add to My Shelf

A Pan-Cancer and Polygenic Bayesian Hierarchical Model for the Effect of Somatic Mutations on Survival

by Lock, Eric F , Hoadley, Katherine A , Samorodnitsky, Sarah in Algorithms , Bayesian analysis , Cancer

2020

We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues of origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type, whereas the mean effect of each gene was shared across cancers. Within this framework, we considered 4 parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. The log-normal model gave the best fit, and we investigated the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53 and FAT4 were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates. The code used for this analysis can be found at https://github.com/sarahsamorodnitsky/Pan-Cancer-Survival-Modeling.git, and the results are summarized at http://ericfrazerlock.com/surv_figs/SurvivalDisplay.html.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter