Catalogue Search | MBRL

A modified decision tree approach to improve the prediction and mutation discovery for drug resistance in Mycobacterium tuberculosis

by Deelder, Wouter , Campino, Susana , Palla, Luigi in Accuracy , Algorithms , Animal Genetics and Genomics

2022

Background Drug resistant Mycobacterium tuberculosis is complicating the effective treatment and control of tuberculosis disease (TB). With the adoption of whole genome sequencing as a diagnostic tool, machine learning approaches are being employed to predict M. tuberculosis resistance and identify underlying genetic mutations. However, machine learning approaches can overfit and fail to identify causal mutations if they are applied out of the box and not adapted to the disease-specific context. We introduce a machine learning approach that is customized to the TB setting, which extracts a library of genomic variants re-occurring across individual studies to improve genotypic profiling. Results We developed a customized decision tree approach, called Treesist-TB, that performs TB drug resistance prediction by extracting and evaluating genomic variants across multiple studies. The application of Treesist-TB to rifampicin (RIF), isoniazid (INH) and ethambutol (EMB) drugs, for which resistance mutations are known, demonstrated a level of predictive accuracy similar to the widely used TB-Profiler tool (Treesist-TB vs. TB-Profiler tool: RIF 97.5% vs. 97.6%; INH 96.8% vs. 96.5%; EMB 96.8% vs. 95.8%). Application of Treesist-TB to less understood second-line drugs of interest, ethionamide (ETH), cycloserine (CYS) and para-aminosalisylic acid (PAS), led to the identification of new variants (52, 6 and 11, respectively), with a high number absent from the TB-Profiler library (45, 4, and 6, respectively). Thereby, Treesist-TB had improved predictive sensitivity (Treesist-TB vs. TB-Profiler tool: PAS 64.3% vs. 38.8%; CYS 45.3% vs. 30.7%; ETH 72.1% vs. 71.1%). Conclusion Our work reinforces the utility of machine learning for drug resistance prediction, while highlighting the need to customize approaches to the disease-specific context. Through applying a modified decision learning approach (Treesist-TB) across a range of anti-TB drugs, we identified plausible resistance-encoding genomic variants with high predictive ability, whilst potentially overcoming the overfitting challenges that can affect standard machine learning applications.

Journal Article

Share this book

Add to My Shelf

Using deep learning to identify recent positive selection in malaria parasite sequence data

by Manko, Emilia , Campino, Susana , Palla, Luigi in Accuracy , Algorithms , Biomedical and Life Sciences

2021

Background Malaria, caused by Plasmodium parasites, is a major global public health problem. To assist an understanding of malaria pathogenesis, including drug resistance, there is a need for the timely detection of underlying genetic mutations and their spread. With the increasing use of whole-genome sequencing (WGS) of Plasmodium DNA, the potential of deep learning models to detect loci under recent positive selection, historically signals of drug resistance, was evaluated. Methods A deep learning-based approach (called “ DeepSweep” ) was developed, which can be trained on haplotypic images from genetic regions with known sweeps, to identify loci under positive selection. DeepSweep software is available from https://github.com/WDee/Deepsweep . Results Using simulated genomic data, DeepSweep could detect recent sweeps with high predictive accuracy (areas under ROC curve > 0.95). DeepSweep was applied to Plasmodium falciparum (n = 1125; genome size 23 Mbp) and Plasmodium vivax (n = 368; genome size 29 Mbp) WGS data, and the genes identified overlapped with two established extended haplotype homozygosity methods (within-population iHS, across-population Rsb) (~ 60–75% overlap of hits at P < 0.0001). DeepSweep hits included regions proximal to known drug resistance loci for both P. falciparum (e.g. pfcrt , pfdhps and pfmdr1 ) and P. vivax (e.g. pvmrp1 ). Conclusion The deep learning approach can detect positive selection signatures in malaria parasite WGS data. Further, as the approach is generalizable, it may be trained to detect other types of selection. With the ability to rapidly generate WGS data at low cost, machine learning approaches (e.g. DeepSweep ) have the potential to assist parasite genome-based surveillance and inform malaria control decision-making.

Journal Article

Share this book

Add to My Shelf

Geographical classification of malaria parasites through applying machine learning to whole genome sequence data

by Deelder, Wouter , Phelan, Jody E. , Manko, Emilia in 631/114/1305 , 631/208/457 , Artificial Intelligence

2022

Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) ( P. falciparum : 750 k, P. vivax : 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities.

Journal Article

Share this book

Add to My Shelf

COVID-profiler: a webserver for the analysis of SARS-CoV-2 sequencing data

by Deelder, Wouter , Ward, Daniel , Hibberd, Martin L. in Algorithms , Binding sites , Bioinformatics

2022

Background SARS-CoV-2 virus sequencing has been applied to track the COVID-19 pandemic spread and assist the development of PCR-based diagnostics, serological assays, and vaccines. With sequencing becoming routine globally, bioinformatic tools are needed to assist in the robust processing of resulting genomic data. Results We developed a web-based bioinformatic pipeline (“COVID-Profiler”) that inputs raw or assembled sequencing data, displays raw alignments for quality control, annotates mutations found and performs phylogenetic analysis. The pipeline software can be applied to other (re-) emerging pathogens. Conclusions The webserver is available at http://genomics.lshtm.ac.uk/ . The source code is available at https://github.com/jodyphelan/covid-profiler .

Journal Article

Share this book

Add to My Shelf

TB-ML—a framework for comparing machine learning approaches to predict drug resistance of Mycobacterium tuberculosis

by Deelder, Wouter , Libiseller-Egger, Julian , Phelan, Jody E in Antimicrobial resistance , Application Note , Drug resistance

2023

Abstract Motivation Machine learning (ML) has shown impressive performance in predicting antimicrobial resistance (AMR) from sequence data, including for Mycobacterium tuberculosis, the causative agent of tuberculosis. However, current ML development and publication practices make it difficult for researchers and clinicians to use, test or reproduce published models. Results We packaged a number of published and unpublished ML models for predicting AMR of M.tuberculosis into Docker containers. Similarly, the pipelines required for pre-processing genomic data into the formats required by the models were also packaged into separate containers. By following a minimal container I/O standard, we ensured as much interoperability as possible. We also created a command-line application, TB-ML, which can be used to easily combine pre-processing and prediction containers into complete pipelines ready for predicting resistance from novel, raw data with a single command. As long as there is adherence to this minimal standard for the container interface, containers produced by researchers holding new models can likewise be included in these pipelines, making benchmark comparisons of different models simple and facilitating faster uptake in the clinic. Availability and implementation TB-ML contains a simple Docker API written in Python and is available at https://github.com/jodyphelan/tb-ml. Example Docker containers for resistance prediction and corresponding data pre-processing as well as a tutorial on how to create new containers for TB-ML are available at https://tb-ml.github.io/tb-ml-containers/. Contact jody.phelan@lshtm.ac.uk

Journal Article

Share this book

Add to My Shelf

Controlling the SARS-CoV-2 outbreak, insights from large scale whole genome sequences generated across the world

by Deelder, Wouter , Ward, Daniel , Campino, Susana in ACE2 , Angiotensin-converting enzyme 2 , Coronaviruses

2020

Background: SARS-CoV-2 most likely evolved from a bat beta-coronavirus and started infecting humans in December 2019. Since then it has rapidly infected people around the world, with more than 3 million confirmed cases by the end of April 2020. Early genome sequencing of the virus has enabled the development of molecular diagnostics and the commencement of therapy and vaccine development. The analysis of the early sequences showed relatively few evolutionary selection pressures. However, with the rapid worldwide expansion into diverse human populations, significant genetic variations are becoming increasingly likely. The current limitations on social movement between countries also offers the opportunity for these viral variants to become distinct strains with potential implications for diagnostics, therapies and vaccines. Methods: We used the current sequencing archives (NCBI and GISAID) to investigate 5,349 whole genomes, looking for evidence of strain diversification and selective pressure. Results: We used 3,958 SNPs to build a phylogenetic tree of SARS-CoV-2 diversity and noted strong evidence for the existence of two major clades and six sub-clades, unevenly distributed across the world. We also noted that convergent evolution has potentially occurred across several locations in the genome, showing selection pressures, including on the spike glycoprotein where we noted a potentially critical mutation that could affect its binding to the ACE2 receptor. We also report on mutations that could prevent current molecular diagnostics from detecting some of the sub-clades. Conclusions: The worldwide whole genome sequencing effort is revealing the challenge of developing SARS-CoV-2 containment tools suitable for everyone and the need for data to be continually evaluated to ensure accuracy in outbreak estimations. Competing Interest Statement The authors have declared no competing interest. Footnotes * This revision has updated the number of samples analysed from 5,349 to 15,487. Minor edits in methods and updated literature. * https://www.gisaid.org/ * https://submit.ncbi.nlm.nih.gov/sarscov2/

Paper

Share this book

Add to My Shelf

Franchising in frontier markets: what's working, what's not and why

by Deelder, Wouter , Beck, Steve , Miller, Robin in Entrepreneurship , Franchises , Management

2010

Journal Article

Share this book

Add to My Shelf

Franchising in Frontier Markets

by Deelder, Wouter , Miller, Robin , Beck, Steve in Advisors , Business models , Consumers

2010

Franchising seems to have a lot to offer frontier markets that seek to develop their local economies. In the last few years researchers and the international development community have begun to promote franchising as potentially the \"next big thing\" in development. Yet when the authors researched low-income markets, they found relatively few large-scale international franchises operating in these challenging markets. Compounding the challenge of limited disposable income that constrains all businesses in these markets, the study identified two key barriers to the growth of franchise businesses in frontier markets: limited access to finance, and the lick of legal frameworks to manage franchise relationships and assets. While franchising may be more efficient than the current public or foreign-aid funded delivery of health care and education services in these markets, no one has yet proven a self-sustaining model that can meet these needs at a large scale. Sector economics in health and education in low-income markets are especially problematic.

Trade Publication Article

Share this book

Add to My Shelf

A metabolomic profile is associated with the risk of incident coronary heart disease

by Böhringer, Stefan , Göraler, Sibel , Boer, Jolanda M.A. in Adult , Angina pectoris , Body mass index

2014

Metabolomics, defined as the comprehensive identification and quantification of low-molecular-weight metabolites to be found in a biological sample, has been put forward as a potential tool for classifying individuals according to their risk of coronary heart disease (CHD). Here, we investigated whether a single-point blood measurement of the metabolome is associated with and predictive for the risk of CHD. We obtained proton nuclear magnetic resonance spectra in 79 cases who developed CHD during follow-up (median 8.1 years) and in 565 randomly selected individuals. In these spectra, 100 signals representing 36 metabolites were identified. Applying least absolute shrinkage and selection operator regression, we defined a weighted metabolite score consisting of 13 proton nuclear magnetic resonance signals that optimally predicted CHD. This metabolite score, including signals representing a lipid fraction, glucose, valine, ornithine, glutamate, creatinine, glycoproteins, citrate, and 1.5-anhydrosorbitol, was associated with the incidence of CHD independent of traditional risk factors (TRFs) (hazard ratio 1.50, 95% CI 1.12-2.01). Predictive performance of this metabolite score on its own was moderate (C-index 0.75, 95% CI 0.70-0.80), but after adding age and sex, the C-index was only modestly lower than that of TRFs (C-index 0.81, 95% CI 0.77-0.85 and C-index 0.82, 95% CI 0.78-0.87, respectively). The metabolite score was also associated with prevalent CHD independent of TRFs (odds ratio 1.59, 95% CI 1.19-2.13). A metabolite score derived from a single-point metabolome measurement is associated with CHD, and metabolomics may be a promising tool for refining and improving the prediction of CHD.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter