Catalogue Search | MBRL

A scored human protein–protein interaction network to catalyze genomic interpretation

by Horn, Heiko , Slodkowicz, Greg , Stærfeldt, Hans H in 631/114/129/2044 , 631/114/2406 , 631/208/191

2017

InWeb_InBioMap (InWeb_IM for short) is a scored, integrated human protein–protein interaction network resource aggregated from public, experimentally determined protein–protein interactions. The resource enables functional interpretation of large-scale genomics data. Genome-scale human protein–protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein–protein interaction network (InWeb_InBioMap, or InWeb_IM) with severalfold more interactions (>500,000) and better functional biological relevance than comparable resources. We illustrate that InWeb_InBioMap enables functional interpretation of >4,700 cancer genomes and genes involved in autism.

Journal Article

Share this book

Add to My Shelf

Assessment of network module identification across complex diseases

by Mall, Raghvendra , Yoo-Ah, Kim , Banf, Michael in Algorithms , Benchmarks , Bioinformatics

2019

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the ‘Disease Module Identification DREAM Challenge’, an open competition to comprehensively assess module identification methods across diverse protein–protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.

Journal Article

Share this book

Add to My Shelf

Quantitative maps of protein phosphorylation sites across 14 different rat organs and tissues

by Dmytriyev, Anatoliy , Secher, Anna , Lage, Kasper in 631/114/2784 , 631/45/612/1246 , Animals

2012

Deregulated cellular signalling is a common hallmark of disease, and delineating tissue phosphoproteomes is key to unravelling the underlying mechanisms. Here we present the broadest tissue catalogue of phosphoproteins to date, covering 31,480 phosphorylation sites on 7,280 proteins quantified across 14 rat organs and tissues. We provide the data set as an easily accessible resource via a web-based database, the CPR PTM Resource. A major fraction of the presented phosphorylation sites are tissue-specific and modulate protein interaction networks that are essential for the function of individual organs. For skeletal muscle, we find that phosphotyrosines are over-represented, which is mainly due to proteins involved in glycogenolysis and muscle contraction, a finding we validate in human skeletal muscle biopsies. Tyrosine phosphorylation is involved in both skeletal and cardiac muscle contraction, whereas glycogenolytic enzymes are tyrosine phosphorylated in skeletal muscle but not in the liver. The presented phosphoproteomic method is simple and rapid, making it applicable for screening of diseased tissue samples. The function of proteins is often regulated by their phosphorylation at specific amino-acid residues. The authors of this article have catalogued phosphoproteins and their phosphorylation sites in 14 rat organs and tissues, and provide these data as a resource for researchers.

Journal Article

Share this book

Add to My Shelf

Comprehensive assessment of cancer missense mutation clustering in protein structures

by Leshchiner, Ignaty , Polak, Paz , Lage, Kasper in Algorithms , Biological Sciences , Cancer

2015

Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg²⁺, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.

Journal Article

Share this book

Add to My Shelf

Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

by Ahmed, Shehab S. , Pérez-Palma, Eduardo , Wagner, Florence F. in Amino Acid Sequence , Amino acids , Biological Sciences

2020

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acidsubstituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.

Journal Article

Share this book

Add to My Shelf

Coexpression network architecture reveals the brain-wide and multiregional basis of disease susceptibility

by Hartl, Christopher L. , Pembroke, William G. , Battle, Alexis in 631/208/199 , 631/208/212/2019 , 631/208/366/1373

2021

Gene networks have yielded numerous neurobiological insights, yet an integrated view across brain regions is lacking. We leverage RNA sequencing in 864 samples representing 12 brain regions to robustly identify 12 brain-wide, 50 cross-regional and 114 region-specific coexpression modules. Nearly 40% of genes fall into brain-wide modules, while 25% comprise region-specific modules reflecting regional biology, such as oxytocin signaling in the hypothalamus, or addiction pathways in the nucleus accumbens. Schizophrenia and autism genetic risk are enriched in brain-wide and multiregional modules, indicative of broad impact; these modules implicate neuronal proliferation and activity-dependent processes, including endocytosis and splicing, in disease pathophysiology. We find that cell-type-specific long noncoding RNA and gene isoforms contribute substantially to regional synaptic diversity and that constrained, mutation-intolerant genes are primarily enriched in neurons. We leverage these data using an omnigenic-inspired network framework to characterize how coexpression and gene regulatory networks reflect neuropsychiatric disease risk, supporting polygenic models. The authors construct brain-wide coexpression networks to characterize regional versus global features, determine if disease susceptibility maps onto regional or brain-wide processes and assess how these networks capture genetic models of disease risk.

Journal Article

Share this book

Add to My Shelf

Systematic multi-trait AAV capsid engineering for efficient gene delivery

by Powell, Megan , Brauer, Pamela P. , Eid, Fatma-Elzahraa in 42/40 , 42/44 , 42/47

2024

Broadening gene therapy applications requires manufacturable vectors that efficiently transduce target cells in humans and preclinical models. Conventional selections of adeno-associated virus (AAV) capsid libraries are inefficient at searching the vast sequence space for the small fraction of vectors possessing multiple traits essential for clinical translation. Here, we present Fit4Function, a generalizable machine learning (ML) approach for systematically engineering multi-trait AAV capsids. By leveraging a capsid library that uniformly samples the manufacturable sequence space, reproducible screening data are generated to train accurate sequence-to-function models. Combining six models, we designed a multi-trait (liver-targeted, manufacturable) capsid library and validated 88% of library variants on all six predetermined criteria. Furthermore, the models, trained only on mouse in vivo and human in vitro Fit4Function data, accurately predicted AAV capsid variant biodistribution in macaque. Top candidates exhibited production yields comparable to AAV9, efficient murine liver transduction, up to 1000-fold greater human hepatocyte transduction, and increased enrichment relative to AAV9 in a screen for liver transduction in macaques. The Fit4Function strategy ultimately makes it possible to predict cross-species traits of peptide-modified AAV capsids and is a critical step toward assembling an ML atlas that predicts AAV capsid performance across dozens of traits. Conventional selections of AAV capsid libraries are inefficient at searching sequence space. Here the authors report ‘Fit4Function’, a generalizable ML approach for systematically engineering multi-trait AAV capsids, and use this to predict cross-species traits of peptide-modified AAV capsids.

Journal Article

Share this book

Add to My Shelf

Translating polygenic risk scores for clinical use by estimating the confidence bounds of risk prediction

by Hougaard, David M. , Folkersen, Lasse , Orho-Melander, Marju in 45/43 , 631/114/1305 , 631/208/457

2021

A promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual’s disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction. The application of polygenic risk scores to individual-level disease susceptibility is challenging, as risk is evaluated at a group-level. Here, the authors describe a machine learning method, Mondrian Cross-Conformal Prediction, that reports disease status conditional probability value at the individual level.

Journal Article

Share this book

Add to My Shelf

NetSig: network-based discovery from cancer genomes

by Horn, Heiko , Chouinard, Candace R , Hu, Jessica Xin in AKT2 protein , Cancer , Computer applications

2018

NetSig is a network-based statistic that identifies cancer driver genes with high accuracy and can be combined with gene-based statistical tests; results are validated with a large-scale in vivo tumorigenesis assay.Methods that integrate molecular network information and tumor genome data could complement gene-based statistical tests to identify likely new cancer genes; but such approaches are challenging to validate at scale, and their predictive value remains unclear. We developed a robust statistic (NetSig) that integrates protein interaction networks with data from 4,742 tumor exomes. NetSig can accurately classify known driver genes in 60% of tested tumor types and predicts 62 new driver candidates. Using a quantitative experimental framework to determine in vivo tumorigenic potential in mice, we found that NetSig candidates induce tumors at rates that are comparable to those of known oncogenes and are ten-fold higher than those of random genes. By reanalyzing nine tumor-inducing NetSig candidates in 242 patients with oncogene-negative lung adenocarcinomas, we find that two (AKT2 and TFDP2) are significantly amplified. Our study presents a scalable integrated computational and experimental workflow to expand discovery from cancer genomes.

Journal Article

Share this book

Add to My Shelf

Genoppi is an open-source software for robust and standardized integration of proteomic and genetic data

by Malolepsza, Edyta , Lage, Kasper , Hsu, Yu-Han H. in 13/100 , 13/106 , 631/114/2784

2021

Combining genetic and cell-type-specific proteomic datasets can generate biological insights and therapeutic hypotheses, but a technical and statistical framework for such analyses is lacking. Here, we present an open-source computational tool called Genoppi (lagelab.org/genoppi) that enables robust, standardized, and intuitive integration of quantitative proteomic results with genetic data. We use Genoppi to analyze 16 cell-type-specific protein interaction datasets of four proteins (BCL2, TDP-43, MDM2, PTEN) involved in cancer and neurological disease. Through systematic quality control of the data and integration with published protein interactions, we show a general pattern of both cell-type-independent and cell-type-specific interactions across three cancer cell types and one human iPSC-derived neuronal cell type. Furthermore, through the integration of proteomic and genetic datasets in Genoppi, our results suggest that the neuron-specific interactions of these proteins are mediating their genetic involvement in neurodegenerative diseases. Importantly, our analyses suggest that human iPSC-derived neurons are a relevant model system for studying the involvement of BCL2 and TDP-43 in amyotrophic lateral sclerosis. Genetic variation can impact protein complexes and interaction networks, but reconciling genetic and proteomic information remains challenging. To address this need, the authors develop Genoppi —a computational tool for integrating genetics and cell-type-specific proteomics data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter