Catalogue Search | MBRL

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

by Lai, Rachel PJ , Ebbels, Timothy , Frainay, Clément in Biology and Life Sciences , Computational Biology - methods , Computer and Information Sciences

2021

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

Journal Article

Share this book

Add to My Shelf

Machine learning to identify pairwise interactions between specific IgE antibodies and their association with asthma: A cross-sectional analysis within a population-based birth cohort

by Fontanella, Sara , Murray, Clare S. , Simpson, Angela in Allergens , Allergens - immunology , Allergies

2018

The relationship between allergic sensitisation and asthma is complex; the data about the strength of this association are conflicting. We propose that the discrepancies arise in part because allergic sensitisation may not be a single entity (as considered conventionally) but a collection of several different classes of sensitisation. We hypothesise that pairings between immunoglobulin E (IgE) antibodies to individual allergenic molecules (components), rather than IgE responses to 'informative' molecules, are associated with increased risk of asthma. In a cross-sectional analysis among 461 children aged 11 years participating in a population-based birth cohort, we measured serum-specific IgE responses to 112 allergen components using a multiplex array (ImmunoCAP Immuno‑Solid phase Allergy Chip [ISAC]). We characterised sensitivity to 44 active components (specific immunoglobulin E [sIgE] > 0.30 units in at least 5% of children) among the 213 (46.2%) participants sensitised to at least one of these 44 components. We adopted several machine learning methodologies that offer a powerful framework to investigate the highly complex sIgE-asthma relationship. Firstly, we applied network analysis and hierarchical clustering (HC) to explore the connectivity structure of component-specific IgEs and identify clusters of component-specific sensitisation ('component clusters'). Of the 44 components included in the model, 33 grouped in seven clusters (C.sIgE-1-7), and the remaining 11 formed singleton clusters. Cluster membership mapped closely to the structural homology of proteins and/or their biological source. Components in the pathogenesis-related (PR)-10 proteins cluster (C.sIgE-5) were central to the network and mediated connections between components from grass (C.sIgE-4), trees (C.sIgE-6), and profilin clusters (C.sIgE-7) with those in mite (C.sIgE-1), lipocalins (C.sIgE-3), and peanut clusters (C.sIgE-2). We then used HC to identify four common 'sensitisation clusters' among study participants: (1) multiple sensitisation (sIgE to multiple components across all seven component clusters and singleton components), (2) predominantly dust mite sensitisation (IgE responses mainly to components from C.sIgE-1), (3) predominantly grass and tree sensitisation (sIgE to multiple components across C.sIgE-4-7), and (4) lower-grade sensitisation. We used a bipartite network to explore the relationship between component clusters, sensitisation clusters, and asthma, and the joint density-based nonparametric differential interaction network analysis and classification (JDINAC) to test whether pairwise interactions of component-specific IgEs are associated with asthma. JDINAC with pairwise interactions provided a good balance between sensitivity (0.84) and specificity (0.87), and outperformed penalised logistic regression with individual sIgE components in predicting asthma, with an area under the curve (AUC) of 0.94, compared with 0.73. We then inferred the differential network of pairwise component-specific IgE interactions, which demonstrated that 18 pairs of components predicted asthma. These findings were confirmed in an independent sample of children aged 8 years who participated in the same birth cohort but did not have component-resolved diagnostics (CRD) data at age 11 years. The main limitation of our study was the exclusion of potentially important allergens caused by both the ISAC chip resolution as well as the filtering step. Clustering and the network analyses might have provided different solutions if additional components had been available. Interactions between pairs of sIgE components are associated with increased risk of asthma and may provide the basis for designing diagnostic tools for asthma.

Journal Article

Share this book

Add to My Shelf

A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques

by Fresnais, Louison , Perin, Olivier , Riu, Anne in Algorithms , Amiodarone , Analysis

2024

Background The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. Results To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA. Conclusion The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module ( https://pypi.org/project/manamodeller/ ) and jupyter notebooks ( https://github.com/LouisonF/MANA ).

Journal Article

Share this book

Add to My Shelf

Genome scale metabolic network modelling for metabolic profile predictions

by Ebbels, Timothy , Frainay, Clément , Jourdan, Fabien in Biology and Life Sciences , Biomarkers , Computer and Information Sciences

2024

Metabolic profiling (metabolomics) aims at measuring small molecules (metabolites) in complex samples like blood or urine for human health studies. While biomarker-based assessment often relies on a single molecule, metabolic profiling combines several metabolites to create a more complex and more specific fingerprint of the disease. However, in contrast to genomics, there is no unique metabolomics setup able to measure the entire metabolome. This challenge leads to tedious and resource consuming preliminary studies to be able to design the right metabolomics experiment. In that context, computer assisted metabolic profiling can be of strong added value to design metabolomics studies more quickly and efficiently. We propose a constraint-based modelling approach which predicts in silico profiles of metabolites that are more likely to be differentially abundant under a given metabolic perturbation (e.g. due to a genetic disease), using flux simulation. In genome-scale metabolic networks, the fluxes of exchange reactions, also known as the flow of metabolites through their external transport reactions, can be simulated and compared between control and disease conditions in order to calculate changes in metabolite import and export. These import/export flux differences would be expected to induce changes in circulating biofluid levels of those metabolites, which can then be interpreted as potential biomarkers or metabolites of interest. In this study, we present SAMBA (SAMpling Biomarker Analysis), an approach which simulates fluxes in exchange reactions following a metabolic perturbation using random sampling, compares the simulated flux distributions between the baseline and modulated conditions, and ranks predicted differentially exchanged metabolites as potential biomarkers for the perturbation. We show that there is a good fit between simulated metabolic exchange profiles and experimental differential metabolites detected in plasma, such as patient data from the disease database OMIM, and metabolic trait-SNP associations found in mGWAS studies. These biomarker recommendations can provide insight into the underlying mechanism or metabolic pathway perturbation lying behind observed metabolite differential abundances, and suggest new metabolites as potential avenues for further experimental analyses.

Journal Article

Share this book

Add to My Shelf

Integrated transcriptomics and metabolomics reveal signatures of lipid metabolism dysregulation in HepaRG liver cells exposed to PCB 126

by Biserni, Martina , Frainay, Clément , Antoniou, Michael N in Animal research , Biocompatibility , Biomarkers

2018

Chemical pollutant exposure is a risk factor contributing to the growing epidemic of non-alcoholic fatty liver disease (NAFLD) affecting human populations that consume a western diet. Although it is recognized that intoxication by chemical pollutants can lead to NAFLD, there is limited information available regarding the mechanism by which typical environmental levels of exposure can contribute to the onset of this disease. Here, we describe the alterations in gene expression profiles and metabolite levels in the human HepaRG liver cell line, a validated model for cellular steatosis, exposed to the polychlorinated biphenyl (PCB) 126, one of the most potent chemical pollutants that can induce NAFLD. Sparse partial least squares classification of the molecular profiles revealed that exposure to PCB 126 provoked a decrease in polyunsaturated fatty acids as well as an increase in sphingolipid levels, concomitant with a decrease in the activity of genes involved in lipid metabolism. This was associated with an increased oxidative stress reflected by marked disturbances in taurine metabolism. A gene ontology analysis showed hallmarks of an activation of the AhR receptor by dioxin-like compounds. These changes in metabolome and transcriptome profiles were observed even at the lowest concentration (100 pM) of PCB 126 tested. A decrease in docosatrienoate levels was the most sensitive biomarker. Overall, our integrated multi-omics analysis provides mechanistic insight into how this class of chemical pollutant can cause NAFLD. Our study lays the foundation for the development of molecular signatures of toxic effects of chemicals causing fatty liver diseases to move away from a chemical risk assessment based on in vivo animal experiments.

Journal Article

Share this book

Add to My Shelf

Mind the Gap: Mapping Mass Spectral Databases in Genome-Scale Metabolic Networks Reveals Poorly Covered Areas

by Yanes, Oscar , Frainay, Clément , Jourdan, Fabien in Acids , Biochemistry , Carbon

2018

The use of mass spectrometry-based metabolomics to study human, plant and microbial biochemistry and their interactions with the environment largely depends on the ability to annotate metabolite structures by matching mass spectral features of the measured metabolites to curated spectra of reference standards. While reference databases for metabolomics now provide information for hundreds of thousands of compounds, barely 5% of these known small molecules have experimental data from pure standards. Remarkably, it is still unknown how well existing mass spectral libraries cover the biochemical landscape of prokaryotic and eukaryotic organisms. To address this issue, we have investigated the coverage of 38 genome-scale metabolic networks by public and commercial mass spectral databases, and found that on average only 40% of nodes in metabolic networks could be mapped by mass spectral information from standards. Next, we deciphered computationally which parts of the human metabolic network are poorly covered by mass spectral libraries, revealing gaps in the eicosanoids, vitamins and bile acid metabolism. Finally, our network topology analysis based on the betweenness centrality of metabolites revealed the top 20 most important metabolites that, if added to MS databases, may facilitate human metabolome characterization in the future.

Journal Article

Share this book

Add to My Shelf

Targeted versus untargeted omics — the CAFSA story

by Colsch, Benoit , Lamari, Foudil , Sedel, Frédéric in Acids , Biochemistry , Carnitine

2018

Background In 2009, untargeted metabolomics led to the delineation of a new clinico-biological entity called cerebellar ataxia with elevated cerebrospinal free sialic acid, or CAFSA. In order to elucidate CAFSA, we applied sequentially targeted and untargeted omic approaches. Methods and results First, we studied five of the six CAFSA patients initially described. Besides increased CSF free sialic acid concentrations, three patients presented with markedly decreased 5-methyltetrahydrofolate (5-MTHF) CSF concentrations. Exome sequencing identified a homozygous POLG mutation in two affected sisters, but failed to identify a causative gene in the three sporadic patients with high sialic acid but low 5-MTHF. Using targeted mass spectrometry, we confirmed that free sialic acid was increased in the CSF of a third known POLG -mutated patient. We then pursued pathophysiological analyses of CAFSA using mass spectrometry-based metabolomics on CSF from two sporadic CAFSA patients as well as 95 patients with an unexplained encephalopathy and 39 controls. This led to the identification of a common metabotype between the two initial CAFSA patients and three additional patients, including one patient with Kearns-Sayre syndrome. Metabolites of the CSF metabotype were positioned in a reconstruction of the human metabolic network, which highlighted the proximity of the metabotype with acetyl-CoA and carnitine, two key metabolites regulating mitochondrial energy homeostasis. Conclusion Our genetic and metabolomics analyses suggest that CAFSA is a heterogeneous entity related to mitochondrial DNA alterations either through POLG mutations or a mechanism similar to what is observed in Kearns-Sayre syndrome.

Journal Article

Share this book

Add to My Shelf

PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration

by Lai, Rachel PJ , Bowler, Russell , Ebbels, Timothy in Analysis , Biological activity , Biological analysis

2024

As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. PathIntegrate is available as an open-source Python package.

Journal Article

Share this book

Add to My Shelf

A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques

by Fresnais, Louison , Perin, Olivier , Riu, Anne in Algorithms , Analysis , Composition

2024

The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA. The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module (https://pypi.org/project/manamodeller/) and jupyter notebooks (https://github.com/LouisonF/MANA).

Journal Article

Share this book

Add to My Shelf

Machine learning to identify pairwise interactions between specific IgE antibodies and their association with asthma: A cross-sectional analysis within a population-based birth cohort

by Fontanella, Sara , Murray, Clare S. , Simpson, Angela in Allergens , Analysis , Antibodies

2018

The relationship between allergic sensitisation and asthma is complex; the data about the strength of this association are conflicting. We propose that the discrepancies arise in part because allergic sensitisation may not be a single entity (as considered conventionally) but a collection of several different classes of sensitisation. We hypothesise that pairings between immunoglobulin E (IgE) antibodies to individual allergenic molecules (components), rather than IgE responses to 'informative' molecules, are associated with increased risk of asthma. In a cross-sectional analysis among 461 children aged 11 years participating in a population-based birth cohort, we measured serum-specific IgE responses to 112 allergen components using a multiplex array (ImmunoCAP Immuno-Solid phase Allergy Chip [ISAC]). We characterised sensitivity to 44 active components (specific immunoglobulin E [sIgE] > 0.30 units in at least 5% of children) among the 213 (46.2%) participants sensitised to at least one of these 44 components. We adopted several machine learning methodologies that offer a powerful framework to investigate the highly complex sIgE-asthma relationship. Firstly, we applied network analysis and hierarchical clustering (HC) to explore the connectivity structure of component-specific IgEs and identify clusters of component-specific sensitisation ('component clusters'). Of the 44 components included in the model, 33 grouped in seven clusters (C.sIgE-1-7), and the remaining 11 formed singleton clusters. Cluster membership mapped closely to the structural homology of proteins and/or their biological source. Components in the pathogenesis-related (PR)-10 proteins cluster (C.sIgE-5) were central to the network and mediated connections between components from grass (C.sIgE-4), trees (C.sIgE-6), and profilin clusters (C.sIgE-7) with those in mite (C.sIgE-1), lipocalins (C.sIgE-3), and peanut clusters (C.sIgE-2). We then used HC to identify four common 'sensitisation clusters' among study participants: (1) multiple sensitisation (sIgE to multiple components across all seven component clusters and singleton components), (2) predominantly dust mite sensitisation (IgE responses mainly to components from C.sIgE-1), (3) predominantly grass and tree sensitisation (sIgE to multiple components across C.sIgE-4-7), and (4) lower-grade sensitisation. We used a bipartite network to explore the relationship between component clusters, sensitisation clusters, and asthma, and the joint density-based nonparametric differential interaction network analysis and classification (JDINAC) to test whether pairwise interactions of component-specific IgEs are associated with asthma. JDINAC with pairwise interactions provided a good balance between sensitivity (0.84) and specificity (0.87), and outperformed penalised logistic regression with individual sIgE components in predicting asthma, with an area under the curve (AUC) of 0.94, compared with 0.73. We then inferred the differential network of pairwise component-specific IgE interactions, which demonstrated that 18 pairs of components predicted asthma. These findings were confirmed in an independent sample of children aged 8 years who participated in the same birth cohort but did not have component-resolved diagnostics (CRD) data at age 11 years. The main limitation of our study was the exclusion of potentially important allergens caused by both the ISAC chip resolution as well as the filtering step. Clustering and the network analyses might have provided different solutions if additional components had been available. Interactions between pairs of sIgE components are associated with increased risk of asthma and may provide the basis for designing diagnostic tools for asthma.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter