Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
11 result(s) for "Hochuli, Joshua"
Sort by:
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening
Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.
AI-driven discovery of synergistic drug combinations against pancreatic cancer
Pancreatic cancer treatment often relies on multi-drug regimens, but optimal combinations remain elusive. This study evaluates predictive approaches to identify synergistic drug combinations using a dataset from the National Center for Advancing Translational Sciences (NCATS). Screening 496 combinations of 32 anticancer compounds against the PANC-1 cells experimentally determined the degree of synergism and antagonism. Three research groups (NCATS, University of North Carolina, and Massachusetts Institute of Technology) leverage these data to apply machine learning (ML) approaches, predicting synergy across 1.6 million combinations. Of the 88 tested, 51 show synergy, with graph convolutional networks achieving the best hit rate and random forest the highest precision. Beyond highlighting the potential of ML, this work delivers 307 experimentally validated synergistic combinations, demonstrating its practical impact in treating pancreatic cancer. Finding optimal multi-drug combinations for pancreatic cancer remains a complex task. Here, authors across three different groups apply machine learning approaches to predict synergy across 1.6 million combinations of drugs for pancreatic cancer, 307 of which are validated experimentally.
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning
Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several “zones” of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. In this study, we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. Although numerous practical challenges remain, these findings suggest that the deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.
Novel Cheminformatics Tools to Drive Experimental Discovery
The field of “cheminformatics” assists and guides research in drug discovery, materials science, and biology by providing tools for understanding and navigating chemical data. Applications of cheminformatic tools often require data-specific solutions, which are developed and applied in this thesis. First, in response to drug discovery-oriented chemical data trending towards more complexity (in terms of magnitude and dimensionality), methods for validation in hit discovery and application of similarity and statistical models on massive chemical datasets are developed here. Second, novel inhibitors of SARS-CoV-2 infection are developed with a focused cheminformatic pipeline, providing compounds with promise for treating COVID-19. Third, natural product chemical space is cataloged and structurally parsed to guide synthetic development for macrocyclic peptides. A publicly available implementation of a peptide parsing algorithm is reported.
Heli-SMACC: Helicase-targeting SMAll Molecule Compound Collection
Helicases have emerged as promising targets for the development of antiviral drugs; however, the family remains largely undrugged. To support the focused development of viral helicase inhibitors we identified, collected, and integrated all chemogenomics data for all available helicases from the ChEMBL database. After thoroughly curating and enriching the data with relevant annotations we have created a derivative database of helicase inhibitors which we dubbed Heli-SMACC ( Heli case-targeting SMA ll Molecule C ompound C ollection). The current version of Heli-SMACC contains 20,432 bioactivity entries for viral, human, and bacterial helicases. We have selected 30 compounds with promising viral helicase activity and tested them in a SARS-CoV-2 NSP13 ATPase assay. Twelve compounds demonstrated ATPase inhibition and a consistent dose-response curve. The Heli-SMACC database may serve as a reference for virologists and medicinal chemists working on the development of novel helicase inhibitors. Heli-SMACC is publicly available at https://smacc.mml.unc.edu . We created a curated Helicase-Targeting SMAll Molecule Compound Collection (Heli-SMACC).Heli-SMACC covers 29 human, viral, and bacterial helicases.Twelve of thirty selected compounds demonstrated inhibitory activity in a SARS-CoV-2 NSP13 ATPase Assay. Heli-SMACC is freely available online at https://smacc.mml.unc.edu .
Allosteric binders of ACE2 are promising anti-SARS-CoV-2 agents
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for an acute treatment for the disease. We investigate whether compounds that bind the human ACE2 protein can interrupt SARS-CoV-2 replication without damaging ACE2's natural enzymatic function. Initial compounds were screened for binding to ACE2 but little interruption of ACE2 enzymatic activity. This set of compounds was extended by application of quantitative structure-activity analysis, which resulted in 512 virtual hits for further confirmatory screening. A subsequent SARS-CoV-2 replication assay revealed that five of these compounds inhibit SARS-CoV-2 replication in human cells. Further effort is required to completely deter-mine the antiviral mechanism of these compounds, but they serve as a strong starting point for both development of acute treatments for COVID-19 and research into the mechanism of infection. Competing Interest Statement AT and ENM are co-founders of Predictive, LLC, which develops computational methodologies and software for toxicity prediction. All other authors declare they have nothing to disclose.
The N-ary in the Coal Mine: Avoiding Mixture Model Failure with Proper Validation
Modeling the properties of chemical mixtures is a difficult but important part of any modeling process intended to be applicable to the often messy and impure phenomena of everyday life, including food and environmental safety, healthcare, etc. Part of this difficulty stems from the increased complexity of designing suitable model validation schemes for mixture data, a fact which has been elucidated in previous work only in the case of binary mixture models. We extend these previously defined validation strategies for QSAR modeling of binary mixtures to the more complex case of general, \\(N\\)-ary mixtures and argue that these strategies are applicable to many modeling tasks beyond simple chemical mixtures. Additionally, we propose a method of establishing a baseline model performance for each mixture dataset to be in used in model selection comparisons. This baseline is intended to account for the statistical dependence generically present between the properties of mixtures that share constituents. We contend that without such a baseline, estimates of model performance can be dramatically overestimated, and we demonstrate this with multiple case studies using real and simulated data.
Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding -- SmallSA -- for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.
Visualizing Convolutional Neural Network Protein-Ligand Scoring
Protein-ligand scoring is an important step in a structure-based drug design pipeline. Selecting a correct binding pose and predicting the binding affinity of a protein-ligand complex enables effective virtual screening. Machine learning techniques can make use of the increasing amounts of structural data that are becoming publicly available. Convolutional neural network (CNN) scoring functions in particular have shown promise in pose selection and affinity prediction for protein-ligand complexes. Neural networks are known for being difficult to interpret. Understanding the decisions of a particular network can help tune parameters and training data to maximize performance. Visualization of neural networks helps decompose complex scoring functions into pictures that are more easily parsed by humans. Here we present three methods for visualizing how individual protein-ligand complexes are interpreted by 3D convolutional neural networks. We also present a visualization of the convolutional filters and their weights. We describe how the intuition provided by these visualizations aids in network design.
Accurate inference of tree topologies from multiple sequence alignments using deep learning
Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several \"zones\" of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. Here we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. While numerous practical challenges remain, these findings suggest that deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.