Catalogue Search | MBRL

External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination

by Siontis, George C.M. , Tzoulaki, Ioanna , Ioannidis, John P.A. in Area Under Curve , Area under the receiver operating characteristics curve , Biomarkers

2015

To evaluate how often newly developed risk prediction models undergo external validation and how well they perform in such validations. We reviewed derivation studies of newly proposed risk models and their subsequent external validations. Study characteristics, outcome(s), and models' discriminatory performance [area under the curve, (AUC)] in derivation and validation studies were extracted. We estimated the probability of having a validation, change in discriminatory performance with more stringent external validation by overlapping or different authors compared to the derivation estimates. We evaluated 127 new prediction models. Of those, for 32 models (25%), at least an external validation study was identified; in 22 models (17%), the validation had been done by entirely different authors. The probability of having an external validation by different authors within 5 years was 16%. AUC estimates significantly decreased during external validation vs. the derivation study [median AUC change: −0.05 (P < 0.001) overall; −0.04 (P = 0.009) for validation by overlapping authors; −0.05 (P < 0.001) for validation by different authors]. On external validation, AUC decreased by at least 0.03 in 19 models and never increased by at least 0.03 (P < 0.001). External independent validation of predictive models in different studies is uncommon. Predictive performance may worsen substantially on external validation.

Journal Article

Share this book

Add to My Shelf

Integrated 'Omics in Idiopathic Pulmonary Fibrosis: Where Do We Go from Here?

by Castaldi, Peter J. in Alveoli , Chronic obstructive pulmonary disease , Cystic fibrosis

2021

Innovative studies with 'omics technologies have led to important insights into the pathophysiology of idiopathic pulmonary fibrosis (IPF). Studies using single-cell transcriptomics have demonstrated the role of novel epithelial and immune cell types in IPF lungs by identifying novel profibrotic alveolar macrophage populations and dedifferentiated epithelial cell populations that secrete profibrotic mediators. Genetic studies of familial pulmonary fibrosis have identified disease-causing genetic variants that disrupt genes involved in telomere maintenance, surfactant production, and mitochondrial functions, and studies of sporadic IPF have identified 14 genome-wide significant associations, including the particularly strong association with a promoter variant causing increased expression of MUC5B. As in many other 'omics studies of IPF lung tissue, the sheer volume of associations can be overwhelming, and it is a challenge to make sense of the thousands of differentially associated RNAs, miRNAs, proteins, and methylated DNA regions.

Journal Article

Share this book

Add to My Shelf

Bipartite Community Structure of eQTLs

by Platig, John , Quackenbush, John , Castaldi, Peter J. in Algorithms , Biology and Life Sciences , Chronic obstructive lung disease

2016

Genome Wide Association Studies (GWAS) and expression quantitative trait locus (eQTL) analyses have identified genetic associations with a wide range of human phenotypes. However, many of these variants have weak effects and understanding their combined effect remains a challenge. One hypothesis is that multiple SNPs interact in complex networks to influence functional processes that ultimately lead to complex phenotypes, including disease states. Here we present CONDOR, a method that represents both cis- and trans-acting SNPs and the genes with which they are associated as a bipartite graph and then uses the modular structure of that graph to place SNPs into a functional context. In applying CONDOR to eQTLs in chronic obstructive pulmonary disease (COPD), we found the global network \"hub\" SNPs were devoid of disease associations through GWAS. However, the network was organized into 52 communities of SNPs and genes, many of which were enriched for genes in specific functional classes. We identified local hubs within each community (\"core SNPs\") and these were enriched for GWAS SNPs for COPD and many other diseases. These results speak to our intuition: rather than single SNPs influencing single genes, we see groups of SNPs associated with the expression of families of functionally related genes and that disease SNPs are associated with the perturbation of those functions. These methods are not limited in their application to COPD and can be used in the analysis of a wide variety of disease processes and other phenotypic traits.

Journal Article

Share this book

Add to My Shelf

A generalized higher-order correlation analysis framework for multi-omics network inference

by Liu, Weixuan , Hersh, Craig , Pratte, Katherine A. in Access control , Algorithms , Biological analysis

2025

Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.

Journal Article

Share this book

Add to My Shelf

Multi-omics subtyping pipeline for chronic obstructive pulmonary disease

by Stene, Evan , Schuyler, Ronald P. , Zhuang, Yonghua in Age Factors , Aged , Biology and Life Sciences

2021

Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of mortality in the United States; however, COPD has heterogeneous clinical phenotypes. This is the first large scale attempt which uses transcriptomics, proteomics, and metabolomics (multi-omics) to determine whether there are molecularly defined clusters with distinct clinical phenotypes that may underlie the clinical heterogeneity. Subjects included 3,278 subjects from the COPDGene cohort with at least one of the following profiles: whole blood transcriptomes (2,650 subjects); plasma proteomes (1,013 subjects); and plasma metabolomes (1,136 subjects). 489 subjects had all three contemporaneous -omics profiles. Autoencoder embeddings were performed individually for each -omics dataset. Embeddings underwent subspace clustering using MineClus, either individually by -omics or combined, followed by recursive feature selection based on Support Vector Machines. Clusters were tested for associations with clinical variables. Optimal single -omics clustering typically resulted in two clusters. Although there was overlap for individual -omics cluster membership, each -omics cluster tended to be defined by unique molecular pathways. For example, prominent molecular features of the metabolome-based clustering included sphingomyelin, while key molecular features of the transcriptome-based clusters were related to immune and bacterial responses. We also found that when we integrated the -omics data at a later stage, we identified subtypes that varied based on age, severity of disease, in addition to diffusing capacity of the lungs for carbon monoxide, and precent on atrial fibrillation. In contrast, when we integrated the -omics data at an earlier stage by treating all data sets equally, there were no clinical differences between subtypes. Similar to clinical clustering, which has revealed multiple heterogenous clinical phenotypes, we show that transcriptomics, proteomics, and metabolomics tend to define clusters of COPD patients with different clinical characteristics. Thus, integrating these different -omics data sets affords additional insight into the molecular nature of COPD and its heterogeneity.

Journal Article

Share this book

Add to My Shelf

Enhanced protein isoform characterization through long-read proteogenomics

by Sheynkman, Gloria M. , Mehlferber, Madison M. , Millikin, Robert J. in algorithms , Alternative Splicing , Animal Genetics and Genomics

2022

Background The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. Results We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. Conclusions Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.

Journal Article

Share this book

Add to My Shelf

Partial correlation network analysis identifies coordinated gene expression within a regional cluster of COPD genome-wide association signals

by Platig, John , Silverman, Edwin K. , Gentili, Michele in Biology and Life Sciences , Case-Control Studies , Chromosome 4

2024

Chronic obstructive pulmonary disease (COPD) is a complex disease influenced by well-established environmental exposures (most notably, cigarette smoking) and incompletely defined genetic factors. The chromosome 4q region harbors multiple genetic risk loci for COPD, including signals near HHIP , FAM13A , GSTCD , TET2 , and BTC . Leveraging RNA-Seq data from lung tissue in COPD cases and controls, we estimated the co-expression network for genes in the 4q region bounded by HHIP and BTC (~70MB), through partial correlations informed by protein-protein interactions. We identified several co-expressed gene pairs based on partial correlations, including NPNT-HHIP , BTC - NPNT and FAM13A - TET2 , which were replicated in independent lung tissue cohorts. Upon clustering the co-expression network, we observed that four genes previously associated to COPD: BTC , HHIP , NPNT and PPM1K appeared in the same network community. Finally, we discovered a sub-network of genes differentially co-expressed between COPD vs controls (including FAM13A , PPA2 , PPM1K and TET2) . Many of these genes were previously implicated in cell-based knock-out experiments, including the knocking out of SPP1 which belongs to the same genomic region and could be a potential local key regulatory gene. These analyses identify chromosome 4q as a region enriched for COPD genetic susceptibility and differential co-expression.

Journal Article

Share this book

Add to My Shelf

Genetic Advances in Chronic Obstructive Pulmonary Disease. Insights from COPDGene

by Silverman, Edwin K. , Bowler, Russell P. , Hersh, Craig P. in Aged , Aged, 80 and over , Bioinformatics

2019

Abstract Chronic obstructive pulmonary disease (COPD) is a common and progressive disease that is influenced by both genetic and environmental factors. For many years, knowledge of the genetic basis of COPD was limited to Mendelian syndromes, such as alpha-1 antitrypsin deficiency and cutis laxa, caused by rare genetic variants. Over the past decade, the proliferation of genome-wide association studies, the accessibility of whole-genome sequencing, and the development of novel methods for analyzing genetic variation data have led to a substantial increase in the understanding of genetic variants that play a role in COPD susceptibility and COPD-related phenotypes. COPDGene (Genetic Epidemiology of COPD), a multicenter, longitudinal study of over 10,000 current and former cigarette smokers, has been pivotal to these breakthroughs in understanding the genetic basis of COPD. To date, over 20 genetic loci have been convincingly associated with COPD affection status, with additional loci demonstrating association with COPD-related phenotypes such as emphysema, chronic bronchitis, and hypoxemia. In this review, we discuss the contributions of the COPDGene study to the discovery of these genetic associations as well as the ongoing genetic investigations of COPD subtypes, protein biomarkers, and post–genome-wide association study analysis.

Journal Article

Share this book

Add to My Shelf

Improved prediction of smoking status via isoform-aware RNA-seq deep learning models

by Cho, Michael , Bowler, Russell , Hersh, Craig in Aged , Artificial neural networks , Biology and Life Sciences

2021

Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.

Journal Article

Share this book

Add to My Shelf

Heritability of Chronic Obstructive Pulmonary Disease and Related Phenotypes in Smokers

by Hersh, Craig P. , Laird, Nan M. , Zhou, Jin J. in Anesthesia. Intensive care medicine. Transfusions. Cell therapy and gene therapy , Biological and medical sciences , Chromosomes

2013

Abstract Rationale Previous studies of chronic obstructive pulmonary disease (COPD) have suggested that genetic factors play an important role in the development of disease. However, single-nucleotide polymorphisms that are associated with COPD in genome-wide association studies have been shown to account for only a small percentage of the genetic variance in phenotypes of COPD, such as spirometry and imaging variables. These phenotypes are highly predictive of disease, and family studies have shown that spirometric phenotypes are heritable. Objectives To assess the heritability and coheritability of four major COPD-related phenotypes (measurements of FEV1, FEV1/FVC, percent emphysema, and percent gas trapping), and COPD affection status in smokers of non-Hispanic white and African American descent using a population design. Methods Single-nucleotide polymorphisms from genome-wide association studies chips were used to calculate the relatedness of pairs of individuals and a mixed model was adopted to estimate genetic variance and covariance. Measurements and Main Results In the non-Hispanic whites, estimated heritabilities of FEV1 and FEV1/FVC were both about 37%, consistent with estimates in the literature from family-based studies. For chest computed tomography scan phenotypes, estimated heritabilities were both close to 25%. Heritability of COPD affection status was estimated as 37.7% in both populations. Conclusions This study suggests that a large portion of the genetic risk of COPD is yet to be discovered and gives rationale for additional genetic studies of COPD. The estimates of coheritability (genetic covariance) for pairs of the phenotypes suggest considerable overlap of causal genetic loci.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter