Catalogue Search | MBRL

Identifying the genes impacted by cell proliferation in proteomics and transcriptomics studies

by Locard-Paulet, Marie , Jensen, Lars Juhl , Palasca, Oana in Biology and Life Sciences , Breast cancer , Cancer

2022

Hypothesis-free high-throughput profiling allows relative quantification of thousands of proteins or transcripts across samples and thereby identification of differentially expressed genes. It is used in many biological contexts to characterize differences between cell lines and tissues, identify drug mode of action or drivers of drug resistance, among others. Changes in gene expression can also be due to confounding factors that were not accounted for in the experimental plan, such as change in cell proliferation. We combined the analysis of 1,076 and 1,040 cell lines in five proteomics and three transcriptomics data sets to identify 157 genes that correlate with cell proliferation rates. These include actors in DNA replication and mitosis, and genes periodically expressed during the cell cycle. This signature of cell proliferation is a valuable resource when analyzing high-throughput data showing changes in proliferation across conditions. We show how to use this resource to help in interpretation of in vitro drug screens and tumor samples. It informs on differences of cell proliferation rates between conditions where such information is not directly available. The signature genes also highlight which hits in a screen may be due to proliferation changes; this can either contribute to biological interpretation or help focus on experiment-specific regulation events otherwise buried in the statistical analysis.

Journal Article

Share this book

Add to My Shelf

Alcoholic liver disease: A registry view on comorbidities and disease prediction

by Grissa, Dhouha , Brunak, Søren , Nytoft Rasmussen, Ditlev in Alcohol , Alcoholic liver diseases , Alcohols

2020

Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.

Journal Article

Share this book

Add to My Shelf

Pre-Clovis Mastodon Hunting 13,800 Years Ago at the Manis Site, Washington

by Waters, Michael R. , Gilbert, M. Thomas P. , Cappellini, Enrico in Americas , Anatomy , Animals

2011

The tip of a projectile point made of mastodon bone is embedded in a rib of a single disarticulated mastodon at the Manis site in the state of Washington. Radiocarbon dating and DNA analysis show that the rib is associated with the other remains and dates to 13,800 years ago. Thus, osseous projectile points, common to the Beringian Upper Paleolithic and Clovis, were made and used during pre-Clovis times in North America. The Manis site, combined with evidence of mammoth hunting at sites in Wisconsin, provides evidence that people were hunting proboscideans at least two millennia before Clovis.

Journal Article

Share this book

Add to My Shelf

Identification of Novel Type 1 Diabetes Candidate Genes by Integrating Genome-Wide Association Data, Protein-Protein Interactions, and Human Pancreatic Islet Gene Expression

by Berchtold, Lukas A. , Palleja, Albert , Størling, Joachim in Beta cells , Biological and medical sciences , CD83 antigen

2012

Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize and substantiate these networks, we performed expressional profiling in human pancreatic islets exposed to proinflammatory cytokines. Three networks were significantly enriched for cytokine-regulated genes and, thus, likely to play an important role for type 1 diabetes in pancreatic islets. Eight of the regulated genes (CD83, IFNGR1, IL17RD, TRAF3IP2, IL27RA, PLCG2, MYO1B, and CXCR7) in these networks also harbored single nucleotide polymorphisms nominally associated with type 1 diabetes. Finally, the expression and cytokine regulation of these new candidate genes were confirmed in insulin-secreting INS-1 β-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.

Journal Article

Share this book

Add to My Shelf

Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records

by Hjaltelin, Jessica Xin , Westergaard, David , Chen, Inna M in Abdomen , Analysis , Anorexia

2023

Pancreatic cancer is one of the deadliest cancer types with poor treatment options. Better detection of early symptoms and relevant disease correlations could improve pancreatic cancer prognosis. In this retrospective study, we used symptom and disease codes (ICD-10) from the Danish National Patient Registry (NPR) encompassing 6.9 million patients from 1994 to 2018,, of whom 23,592 were diagnosed with pancreatic cancer. The Danish cancer registry included 18,523 of these patients. To complement and compare the registry diagnosis codes with deeper clinical data, we used a text mining approach to extract symptoms from free text clinical notes in electronic health records (3078 pancreatic cancer patients and 30,780 controls). We used both data sources to generate and compare symptom disease trajectories to uncover temporal patterns of symptoms prior to pancreatic cancer diagnosis for the same patients. We show that the text mining of the clinical notes was able to complement the registry-based symptoms by capturing more symptoms prior to pancreatic cancer diagnosis. For example, ‘Blood pressure reading without diagnosis’, ‘Abnormalities of heartbeat’, and ‘Intestinal obstruction’ were not found for the registry-based analysis. Chaining symptoms together in trajectories identified two groups of patients with lower median survival (<90 days) following the trajectories ‘Cough→Jaundice→Intestinal obstruction’ and ‘Pain→Jaundice→Abnormal results of function studies’. These results provide a comprehensive comparison of the two types of pancreatic cancer symptom trajectories, which in combination can leverage the full potential of the health data and ultimately provide a fuller picture for detection of early risk factors for pancreatic cancer. Pancreatic cancer is one of the deadliest cancer types. Scientists predict it will become the second largest cause of cancer-related deaths in 2030. It has few or no symptoms at early stages and often goes undetected for an extended period. As a result, patients are often diagnosed at an advanced stage when they have few treatment options and lower survival rates. Only 11 percent of patients with pancreatic cancer survive five years past their diagnosis. Earlier detection and surgery to remove the tumor increase patient survival to 42% at five years. Those who undergo surgery at the earliest stage have an 84% survival rate at five years. Developing ways to screen for and detect pancreatic cancer early could improve patient survival. Identifying early symptoms is critical. So far, studies show links between weight loss, abdominal pain, lower back pain, and new-onset diabetes and pancreatic cancer. But clinicians often overlook these symptoms or do not associate them with cancer. National health registries may be data sources that scientists can use to zoom in on early pancreatic symptoms and create alerts for clinicians. Hjaltelin, Novitski et al. identified potential pancreatic cancer symptoms using patient registry data and electronic health records. Hjaltelin, Novitski et al. extracted potential pancreatic cancer-related disease or symptom trajectories from 7 million patients listed in the Danish National Patient Registry. They also scoured clinical notes in 34,000 patients’ electronic health records for symptoms. The electronic health records yielded more promising symptoms than the registry. But both data sources produced complementary information. The analysis showed that some symptoms, like jaundice, were associated with higher survival rates because they may lead to earlier diagnosis. The data so far suggest that symptoms leading up to a pancreatic cancer diagnosis may be nonspecific and not occur in a particular order. As the cancer progresses, symptoms may become more specific and severe. Further assessment of the study’s results is necessary. Tools like artificial intelligence or advanced text mining may allow scientists identify more definitive early symptom trajectories and help clinicians identify patients earlier.

Journal Article

Share this book

Add to My Shelf

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

by Huerta-Cepas, Jaime , Luis Pedro Coelho , slund, Kristoffer in Annotations , Genomes , Homology

2017

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

Journal Article

Share this book

Add to My Shelf

Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

by Rodríguez, Cristina Leal , Kirk, Isa Kristina , Grarup, Niels in Adolescent , Adult , Aged

2019

Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

Journal Article

Share this book

Add to My Shelf

Arena3Dweb: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration

by Jensen, Lars Juhl , Karatzas, Evangelos , Schneider, Reinhard in Algorithms , Application programming interface , Bioinformatics

2023

Abstract Arena3Dweb is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3Dweb supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3Dweb for 3D multi-layer visualization. Arena3Dweb is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org

Journal Article

Share this book

Add to My Shelf

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

by Westergaard, David , Tønsberg, Christian , Brunak, Søren in Abstracting and Indexing as Topic , Abstracts , Acids

2018

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Journal Article

Share this book

Add to My Shelf

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning

by Webel, Henry , Mann, Matthias , Niu, Lili in 140/58 , 631/114/1314 , 631/114/2397

2024

Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Here we demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can impute missing values in the context of LFQ at different levels. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. Removing 20 percent of the intensities we were able to recover 15 out of 17 significant abundant protein groups using PIMMS-VAE imputations. When analyzing the full dataset we identified 30 additional proteins (+13.2%) that were significantly differentially abundant across disease stages compared to no imputation and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics on larger datasets and provide workflows for these. Imputation in mass spectrometry-based proteomics is a recurrent step of importance for downstream analysis. Here, the authors offer an extensive comparison workflow of 27 established with three new scalable, fast and performant methods from deep learning for large and high-dimensional data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter