Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
136 result(s) for "Prosperi, Mattia"
Sort by:
Causal inference and counterfactual prediction in machine learning for actionable healthcare
Big data, high-performance computing, and (deep) machine learning are increasingly becoming key to precision medicine—from identifying disease risks and taking preventive measures, to making diagnoses and personalizing treatment for individuals. Precision medicine, however, is not only about predicting risks and outcomes, but also about weighing interventions. Interventional clinical predictive models require the correct specification of cause and effect, and the calculation of so-called counterfactuals, that is, alternative scenarios. In biomedical research, observational studies are commonly affected by confounding and selection bias. Without robust assumptions, often requiring a priori domain knowledge, causal inference is not feasible. Data-driven prediction models are often mistakenly used to draw causal effects, but neither their parameters nor their predictions necessarily have a causal interpretation. Therefore, the premise that data-driven prediction models lead to trustable decisions/interventions for precision medicine is questionable. When pursuing intervention modelling, the bio-health informatics community needs to employ causal approaches and learn causal structures. Here we discuss how target trials (algorithmic emulation of randomized studies), transportability (the licence to transfer causal effects from one population to another) and prediction invariance (where a true causal model is contained in the set of all prediction models whose accuracy does not vary across different settings) are linchpins to developing and testing intervention models. Machine learning models are commonly used to predict risks and outcomes in biomedical research. But healthcare often requires information about cause–effect relations and alternative scenarios, that is, counterfactuals. Prosperi et al. discuss the importance of interventional and counterfactual models, as opposed to purely predictive models, in the context of precision medicine.
A comparative study of antibiotic resistance patterns in Mycobacterium tuberculosis
This study leverages the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) to analyze over 27,000 Mycobacterium tuberculosis (MTB) genomic strains, providing a comprehensive and large-scale overview of antibiotic resistance (AMR) prevalence and resistance patterns. We used MTB++, which is the newest and most comprehensive AI-based MTB drug resistance profiler tool, to predict the resistance profile of each of the 27,000 MTB isolates and then used feature analysis to identify key genes that were associated with the resistance. There are three main contributions to this study. Firstly, it provides a detailed picture of the prevalence of specific AMR genes in the BV-BRC dataset as well as their biological implications, providing critical insight into MTB’s resistance mechanisms that can help identify genes of high priority for further investigation. The second aspect of this study is to compare the prevalence of antibiotic resistance across previous studies that have addressed both the temporal and geographical evolution of MTB drug resistance. Lastly, this study emphasizes the need for targeted diagnostics and personalized treatment plans. In addition to these contributions, the study acknowledges the limitations of computational prediction and recommends future experimental validation.
Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
Background Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and has potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-genomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and software for whole genome assembly and genome variation analysis have been developed and refined for NGS data, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for analysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures. Here we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing experiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned against a reference genome, and that the reference genome is partitioned into a set of sliding windows (amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to minimise the reconstruction of false variants, called in-silico recombinants. Results The reconstruction algorithm was applied to error-free simulated data and reconstructed a high percentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is high. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to characterise different viral variants from distinct patients. Conclusions The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given a determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed good performance both considering simulated data and real data, even in presence of sequencing errors.
Emergence of recombinant Mayaro virus strains from the Amazon basin
Mayaro virus (MAYV), causative agent of Mayaro Fever, is an arbovirus transmitted by Haemagogus mosquitoes. Despite recent attention due to the identification of several cases in South and Central America and the Caribbean, limited information on MAYV evolution and epidemiology exists and represents a barrier to prevention of further spread. We present a thorough spatiotemporal evolutionary study of MAYV full-genome sequences collected over the last sixty years within South America and Haiti, revealing recent recombination events and adaptation to a broad host and vector range, including Aedes mosquito species. We employed a Bayesian phylogeography approach to characterize the emergence of recombinants in Brazil and Haiti and report evidence in favor of the putative role of human mobility in facilitating recombination among MAYV strains from geographically distinct regions. Spatiotemporal characteristics of recombination events and the emergence of this previously neglected virus in Haiti, a known hub for pathogen spread to the Americas, warrants close monitoring of MAYV infection in the immediate future.
Big data hurdles in precision medicine and precision public health
Background Nowadays, trendy research in biomedical sciences juxtaposes the term ‘precision’ to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population. Main body The present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning’s denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources. Conclusions Data science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.
Predictors of first-line antiretroviral therapy discontinuation due to drug-related adverse events in HIV-infected patients: a retrospective cohort study
Background Drug-related toxicity has been one of the main causes of antiretroviral treatment discontinuation. However, its determinants are not fully understood. Aim of this study was to investigate predictors of first-line antiretroviral therapy discontinuation due to adverse events and their evolution in recent years. Methods Patients starting first-line antiretroviral therapy were retrospectively selected. Primary end-point was the time to discontinuation of therapy due to adverse events, estimating incidence, fitting Kaplan-Meier and multivariable Cox regression models upon clinical/demographic/chemical baseline patients’ markers. Results 1,096 patients were included: 302 discontinuations for adverse events were observed over 1,861 person years of follow-up between 1988 and 2010, corresponding to an incidence (95% CI) of 0.16 (0.14-0.18). By Kaplan-Meier estimation, the probabilities (95% CI) of being free from an adverse event at 90 days, 180 days, one year, two years, and five years were 0.88 (0.86-0.90), 0.85 (0.83-0.87), 0.79 (0.76-0.81), 0.70 (0.67-0.74), 0.55 (0.50-0.61), respectively. The most represented adverse events were gastrointestinal symptoms (28.5%), hematological (13.2%) or metabolic (lipid and glucose metabolism, lipodystrophy) (11.3%) toxicities and hypersensitivity reactions (9.3%). Factors associated with an increased hazard of adverse events were: older age, CDC stage C, female gender, homo/bisexual risk group (vs. heterosexual), HBsAg-positivity. Among drugs, zidovudine, stavudine, zalcitabine, didanosine, full-dose ritonavir, indinavir but also efavirenz (actually recommended for first-line regimens) were associated to an increased hazard of toxicity. Moreover, patients infected by HIV genotype F1 showed a trend for a higher risk of adverse events. Conclusions After starting antiretroviral therapy, the probability of remaining free from adverse events seems to decrease over time. Among drugs associated with increased toxicity, only one is currently recommended for first-line regimens but with improved drug formulation. Older age, CDC stage, MSM risk factor and gender are also associated with an increased hazard of toxicity and should be considered when designing a first-line regimen.
A novel methodology for large-scale phylogeny partition
Phylogenetic analysis is used to identify transmission chains, but no software is available for the automated partition of large phylogenies. Prosperi et al . apply a new search algorithm to identify transmission clusters within the phylogeny of HIV-1 gene sequences linking molecular and epidemiological data. Understanding the determinants of virus transmission is a fundamental step for effective design of screening and intervention strategies to control viral epidemics. Phylogenetic analysis can be a valid approach for the identification of transmission chains, and very-large data sets can be analysed through parallel computation. Here we propose and validate a new methodology for the partition of large-scale phylogenies and the inference of transmission clusters. This approach, on the basis of a depth-first search algorithm, conjugates the evaluation of node reliability, tree topology and patristic distance analysis. The method has been applied to identify transmission clusters of a phylogeny of 11,541 human immunodeficiency virus-1 subtype B pol gene sequences from a large Italian cohort. Molecular transmission chains were characterized by means of different clinical/demographic factors, such as the interaction between male homosexuals and male heterosexuals. Our method takes an advantage of a flexible notion of transmission cluster and can become a general framework to analyse other epidemics.
Fast and exact quantification of motif occurrences in biological sequences
Background Identification of motifs and quantification of their occurrences are important for the study of genetic diseases, gene evolution, transcription sites, and other biological mechanisms. Exact formulae for estimating count distributions of motifs under Markovian assumptions have high computational complexity and are impractical to be used on large motif sets. Approximated formulae, e.g. based on compound Poisson, are faster, but reliable p value calculation remains challenging. Here, we introduce ‘motif_prob’, a fast implementation of an exact formula for motif count distribution through progressive approximation with arbitrary precision. Our implementation speeds up the exact calculation, usually impractical, making it feasible and posit to substitute currently employed heuristics. Results We implement motif_prob in both Perl and C+ + languages, using an efficient error-bound iterative process for the exact formula, providing comparison with state-of-the-art tools (e.g. MoSDi) in terms of precision, run time benchmarks, along with a real-world use case on bacterial motif characterization. Our software is able to process a million of motifs (13–31 bases) over genome lengths of 5 million bases within the minute on a regular laptop, and the run times for both the Perl and C+ + code are several orders of magnitude smaller (50–1000× faster) than MoSDi, even when using their fast compound Poisson approximation (60–120× faster). In the real-world use cases, we first show the consistency of motif_prob with MoSDi, and then how the p-value quantification is crucial for enrichment quantification when bacteria have different GC content, using motifs found in antimicrobial resistance genes. The software and the code sources are available under the MIT license at https://github.com/DataIntellSystLab/motif_prob . Conclusions The motif_prob software is a multi-platform and efficient open source solution for calculating exact frequency distributions of motifs. It can be integrated with motif discovery/characterization tools for quantifying enrichment and deviation from expected frequency ranges with exact p values, without loss in data processing efficiency.
Challenges in Identifying Asthma Subgroups Using Unsupervised Statistical Learning Techniques
Unsupervised statistical learning techniques, such as exploratory factor analysis (EFA) and hierarchical clustering (HC), have been used to identify asthma phenotypes, with partly consistent results. Some of the inconsistency is caused by the variable selection and demographic and clinical differences among study populations. To investigate the effects of the choice of statistical method and different preparations of data on the clustering results; and to relate these to disease severity. Several variants of EFA and HC were applied and compared using various sets of variables and different encodings and transformations within a dataset of 383 children with asthma. Variables included lung function, inflammatory and allergy markers, family history, environmental exposures, and medications. Clusters and original variables were related to asthma severity (logistic regression and Bayesian network analysis). EFA identified five components (eigenvalues ≥ 1) explaining 35% of the overall variance. Variations of the HC (as linkage-distance functions) did not affect the cluster inference; however, using different variable encodings and transformations did. The derived clusters predicted asthma severity less than the original variables. Prognostic factors of severity were medication usage, current symptoms, lung function, paternal asthma, body mass index, and age of asthma onset. Bayesian networks indicated conditional dependence among variables. The use of different unsupervised statistical learning methods and different variable sets and encodings can lead to multiple and inconsistent subgroupings of asthma, not necessarily correlated with severity. The search for asthma phenotypes needs more careful selection of markers, consistent across different study populations, and more cautious interpretation of results from unsupervised learning.
DeepDynaForecast: Phylogenetic-informed graph deep learning for epidemic transmission dynamic prediction
In the midst of an outbreak or sustained epidemic, reliable prediction of transmission risks and patterns of spread is critical to inform public health programs. Projections of transmission growth or decline among specific risk groups can aid in optimizing interventions, particularly when resources are limited. Phylogenetic trees have been widely used in the detection of transmission chains and high-risk populations. Moreover, tree topology and the incorporation of population parameters (phylodynamics) can be useful in reconstructing the evolutionary dynamics of an epidemic across space and time among individuals. We now demonstrate the utility of phylodynamic trees for transmission modeling and forecasting, developing a phylogeny-based deep learning system, referred to as DeepDynaForecast . Our approach leverages a primal-dual graph learning structure with shortcut multi-layer aggregation, which is suited for the early identification and prediction of transmission dynamics in emerging high-risk groups. We demonstrate the accuracy of DeepDynaForecast using simulated outbreak data and the utility of the learned model using empirical, large-scale data from the human immunodeficiency virus epidemic in Florida between 2012 and 2020. Our framework is available as open-source software (MIT license) at github.com/lab-smile/DeepDynaForcast.