Catalogue Search | MBRL

Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra

by Gerwick, William H. , Ludwig, Marcus , Reher, Raphael in 631/114/1305 , 631/92/320 , Agriculture

2021

Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level. Unknown metabolites are classified from mass spectrometry data.

Journal Article

Share this book

Add to My Shelf

High-confidence structural annotation of metabolites absent from spectral libraries

by Ludwig, Marcus , Hoffmann, Martin A. , Witting, Michael in 631/114/1314 , 631/337 , Agriculture

2022

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries. COSMIC outperforms spectral library search for metabolite annotation and annotates previously unseen structures.

Journal Article

Share this book

Add to My Shelf

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem

by Hoffmann, Martin , Kretschmer, Fleming , Ludwig, Marcus in Accuracy , Annotations , Computational chemistry

2023

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter ‘u’. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.

Journal Article

Share this book

Add to My Shelf

RepoRT: a comprehensive repository for small molecule retention times

by Kretschmer, Fleming , Hoffmann, Martin A. , Witting, Michael in 631/114/129 , 631/1647/2196 , 631/45/320

2024

Journal Article

Share this book

Add to My Shelf

Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

by Koester, Irina , Ludwig, Marcus , Hoffmann, Martin A. in 631/114 , 631/45/320 , Algorithms

2020

The confident high-throughput identification of small molecules is one of the most challenging tasks in mass spectrometry-based metabolomics. Annotating the molecular formula of a compound is the first step towards its structural elucidation. Yet even the annotation of molecular formulas remains highly challenging. This is particularly so for large compounds above 500 daltons, and for de novo annotations, for which we consider all chemically feasible formulas. Here we present ZODIAC, a network-based algorithm for the de novo annotation of molecular formulas. Uniquely, it enables fully automated and swift processing of complete experimental runs, providing high-quality, high-confidence molecular formula annotations. This allows us to annotate novel molecular formulas that are absent from even the largest public structure databases. Our method re-ranks molecular formula candidates by considering joint fragments and losses between fragmentation trees. We employ Bayesian statistics and Gibbs sampling. Thorough algorithm engineering ensures fast processing in practice. We evaluate ZODIAC on five datasets, producing results substantially (up to 16.5-fold) better than for several other methods, including SIRIUS, which is the state-of-the-art algorithm for molecular formula annotation at present. Finally, we report and verify several novel molecular formulas annotated by ZODIAC. To infer a previously unknown molecular formula from mass spectrometry data is a challenging, yet neglected problem. Ludwig and colleagues present a network-based approach to ranking possible formulas.

Journal Article

Share this book

Add to My Shelf

Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

by Koester, Irina , Ludwig, Marcus , Hoffmann, Martin A.

2020

Journal Article

Share this book

Add to My Shelf

Publisher Correction: Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

by Koester, Irina , Ludwig, Marcus , Hoffmann, Martin A. in 631/114 , 631/45/320 , Engineering

2020

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Journal Article

Share this book

Add to My Shelf

Classes for the masses: Systematic classification of unknowns using fragmentation spectra

by Ludwig, Marcus , Hoffmann, Martin A , Dorrestein, Pieter C in Bioinformatics , Colonization , Computer applications

2020

Metabolomics experiments can employ non-targeted tandem mass spectrometry to detect hundreds to thousands of molecules in a biological sample. Structural annotation of molecules is typically carried out by searching their fragmentation spectra in spectral libraries or, recently, in structure databases. Annotations are limited to structures present in the library or database employed, prohibiting a thorough utilization of the experimental data. We present a computational tool for systematic compound class annotation: CANOPUS uses a deep neural network to predict 1,270 compound classes from fragmentation spectra, and explicitly targets compounds where neither spectral nor structural reference data are available. CANOPUS even predicts classes for which no MS/MS training data are available. We demonstrate the broad utility of CANOPUS by investigating the effect of the microbial colonization in the digestive system in mice, and through analysis of the chemodiversity of different Euphorbia plants; both uniquely revealing biological insights at the compound class level. Competing Interest Statement SB, KD, ML, MF, and MAH are co-founders of Bright Giant GmbH. PCD is scientific advisor for Sirenas LLC.

Paper

Share this book

Add to My Shelf

Assigning confidence to structural annotations from mass spectra with COSMIC

by Hoffmann, Martin A , Ludwig, Marcus , Gentry, Emily C in Advisors , Annotations , Bioinformatics

2021

Abstract Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. As biological interpretation relies on accurate structure annotations, the ability to assign confidence to such annotations is a key outstanding problem. We introduce the COSMIC workflow that combines structure database generation, in silico annotation, and a confidence score consisting of kernel density p-value estimation and a Support Vector Machine with enforced directionality of features. In evaluation, COSMIC annotates a substantial number of hits at small false discovery rates, and outperforms spectral library search for this purpose. To demonstrate that COSMIC can annotate structures never reported before, we annotated twelve novel bile acid conjugates; nine structures were confirmed by manual evaluation and two structures using synthetic standards. Second, we annotated and manually evaluated 315 molecular structures in human samples currently absent from the Human Metabolome Database. Third, we applied COSMIC to 17,400 experimental runs and annotated 1,715 structures with high confidence that were absent from spectral libraries. Competing Interest Statement S.B., K.D., M.L., M.F., and M.A.H. are co-founders of Bright Giant GmbH. P.C.D. is scientific advisor for Sirenas LLC, Galileo, Cybele and is scientific advisor and co-founder of Enveda and Ometa. Footnotes * ↵7 Shared first authors * https://bio.informatik.uni-jena.de/cosmic

Paper

Share this book

Add to My Shelf

ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules

by Koester, Irina , Hoffmann, Martin Andre , Ludwig, Marcus in Algorithms , Bayesian analysis , Bioinformatics

2019

The confident high-throughput identification of small molecules remains one of the most challenging tasks in mass spectrometry-based metabolomics. SIRIUS has become a powerful tool for the interpretation of tandem mass spectra, and shows outstanding performance for identifying the molecular formula of a query compound, being the first step of structure identification. Nevertheless, the identification of both molecular formulas for large compounds above 500 Daltons and novel molecular formulas remains highly challenging. Here, we present ZODIAC, a network-based algorithm for the de novo estimation of molecular formulas. ZODIAC reranks SIRIUS' molecular formula candidates, combining fragmentation tree computation with Bayesian statistics using Gibbs sampling. Through careful algorithm engineering, ZODIAC's Gibbs sampling is very swift in practice. ZODIAC decreases incorrect annotations 16.2-fold on a challenging plant extract dataset with most compounds above 700 Dalton; we then show improvements on four additional, diverse datasets. Our analysis led to the discovery of compounds with novel molecular formulas such as C24H47BrNO8P which, as of today, is not present in any publicly available molecular structure databases. Footnotes * https://bio.informatik.uni-jena.de/

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter