Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
159
result(s) for
"Wang Mingxun"
Sort by:
SIMILE enables alignment of tandem mass spectra with statistical significance
2022
Interrelating small molecules according to their aligned fragmentation spectra is central to tandem mass spectrometry-based untargeted metabolomics. Current alignment algorithms do not provide statistical significance and compounds that have multiple delocalized structural differences and therefore often fail to have their fragment ions aligned. Here we align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE). SIMILE yields spectral alignment inferred structural connections in molecular networks that are not found with cosine-based scoring algorithms. In addition, it is now possible to rank spectral alignments based on p-values in the exploration of structural relationships between compounds and enhance the chemical connectivity that can be obtained with molecular networking.
Interrelating metabolites by their fragmentation spectra is central to metabolomics. Here the authors align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE).
Journal Article
Learning representations of microbe–metabolite interactions
by
Morton, James T
,
Aksenov, Alexander A
,
Louis Felix Nothias
in
Bioengineering
,
Biology
,
Conditional probability
2019
Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural networks (https://github.com/biocore/mmvec) to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe–metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.
Journal Article
Significance estimation for large scale metabolomics annotations by spectral matching
by
Scheubert, Kerstin
,
Nothias, Louis-Félix
,
Dorrestein, Pieter C.
in
631/114/2415
,
631/45/320
,
639/638/11
2017
The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from −92 up to +5705%) when compared with a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science.
Matching fragment spectra to reference library spectra is an important procedure for annotating small molecules in untargeted mass spectrometry based metabolomics studies. Here, the authors develop strategies to estimate false discovery rates (FDR) by empirical Bayes and target-decoy based methods which enable a user to define the scoring criteria for spectral matching.
Journal Article
Propagating annotations of molecular networks using in silico fragmentation
by
van der Hooft, Justin J. J.
,
Balunas, Marcy J.
,
Lopes, Norberto Peporine
in
Animals
,
Annotations
,
Ants - microbiology
2018
The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.
Journal Article
Molecular cartography of the human skin surface in 3D
by
Bouslimani, Amina
,
Dorrestein, Kathleen
,
Dorrestein, Pieter C.
in
Adult
,
bacteria
,
Biological Sciences
2015
Significance The paper describes the implementation of an approach to study the chemical makeup of human skin surface and correlate it to the microbes that live in the skin. We provide the translation of molecular information in high-spatial resolution 3D to understand the body distribution of skin molecules and bacteria. In addition, we use integrative analysis to interpret, at a molecular level, the large scale of data obtained from human skin samples. Correlations between molecules and microbes can be obtained to further gain insights into the chemical milieu in which these different microbial communities live.
The human skin is an organ with a surface area of 1.5–2 m ² that provides our interface with the environment. The molecular composition of this organ is derived from host cells, microbiota, and external molecules. The chemical makeup of the skin surface is largely undefined. Here we advance the technologies needed to explore the topographical distribution of skin molecules, using 3D mapping of mass spectrometry data and microbial 16S rRNA amplicon sequences. Our 3D maps reveal that the molecular composition of skin has diverse distributions and that the composition is defined not only by skin cells and microbes but also by our daily routines, including the application of hygiene products. The technological development of these maps lays a foundation for studying the spatial relationships of human skin with hygiene, the microbiota, and environment, with potential for developing predictive models of skin phenotypes tailored to individual health.
Journal Article
foodMASST a mass spectrometry search tool for foods and beverages
2022
There is a growing interest in unraveling the chemical complexity of our diets. To help the scientific community gain insight into the molecules present in foods and beverages that we ingest, we created foodMASST, a search tool for MS/MS spectra (of both known and unknown molecules) against a growing metabolomics food and beverage reference database. We envision foodMASST will become valuable for nutrition research and to assess the potential uniqueness of dietary biomarkers to represent specific foods or food classes.
Journal Article
Chemically informed analyses of metabolomics mass spectrometry data with Qemistree
by
van der Hooft, Justin J. J.
,
Zhu, Qiyun
,
Gauglitz, Julia M.
in
631/1647/296
,
631/92/320
,
631/92/349
2021
Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.
Qemistree uses fragmentation spectra to predict molecular fingerprints and represent their relationships as a tree, enabling comparison of metabolomics data across different experimental conditions and exploration of chemical diversity in mixtures.
Journal Article
MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools
by
Kang, Kyo Bin
,
van der Hooft, Justin J.J.
,
Medema, Marnix H.
in
Annotations
,
chemical classification
,
Computer applications
2019
Metabolomics has started to embrace computational approaches for chemical interpretation of large data sets. Yet, metabolite annotation remains a key challenge. Recently, molecular networking and MS2LDA emerged as molecular mining tools that find molecular families and substructures in mass spectrometry fragmentation data. Moreover, in silico annotation tools obtain and rank candidate molecules for fragmentation spectra. Ideally, all structural information obtained and inferred from these computational tools could be combined to increase the resulting chemical insight one can obtain from a data set. However, integration is currently hampered as each tool has its own output format and efficient matching of data across these tools is lacking. Here, we introduce MolNetEnhancer, a workflow that combines the outputs from molecular networking, MS2LDA, in silico annotation tools (such as Network Annotation Propagation or DEREPLICATOR), and the automated chemical classification through ClassyFire to provide a more comprehensive chemical overview of metabolomics data whilst at the same time illuminating structural details for each fragmentation spectrum. We present examples from four plant and bacterial case studies and show how MolNetEnhancer enables the chemical annotation, visualization, and discovery of the subtle substructural diversity within molecular families. We conclude that MolNetEnhancer is a useful tool that greatly assists the metabolomics researcher in deciphering the metabolome through combination of multiple independent in silico pipelines.
Journal Article
An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
2025
Background
Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability.
Result
In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use.
Conclusion
It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.
Journal Article
BLINK enables ultrafast tandem mass spectrometry cosine similarity scoring
by
de Jong, Wibe
,
Treen, Daniel G. C.
,
Wang, Mingxun
in
631/1647/320
,
639/638/11/296
,
Agreements
2023
Metabolomics has a long history of using cosine similarity to match experimental tandem mass spectra to databases for compound identification. Here we introduce the Blur-and-Link (BLINK) approach for scoring cosine similarity. By bypassing fragment alignment and simultaneously scoring all pairs of spectra using sparse matrix operations, BLINK is over 3000 times faster than MatchMS, a widely used loop-based alignment and scoring implementation. Using a similarity cutoff of 0.7, BLINK and MatchMS had practically equivalent identification agreement, and greater than 99% of their scores and matching ion counts were identical. This performance improvement can enable calculations to be performed that would typically be limited by time and available computational resources.
Journal Article