Catalogue Search | MBRL

GNINA 1.0: molecular docking with deep learning

by Koes, David Ryan , Masuda, Tomohide , Meli, Rocco in Accuracy , Algorithms , Artificial neural networks

2021

Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. Gnina , utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of Gnina under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .

Journal Article

Share this book

Add to My Shelf

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

by Wu, Zhenxing , Wang, Zhe , Hsieh, Chang-Yu in ADME/T prediction , Algorithms , Chemistry

2021

Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Journal Article

Share this book

Add to My Shelf

Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions

by Bajorath Jürgen , Rodríguez-Pérez, Raquel in Artificial neural networks , Decision trees , Machine learning

2020

Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.

Journal Article

Share this book

Add to My Shelf

Molecular representations in AI-driven drug discovery: a review and practical guide

by Engkvist, Ola , Mercado, Rocío , David, Laurianne in Big Data in Chemistry , Cheminformatics , Chemistry

2020

The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.

Journal Article

Share this book

Add to My Shelf

Simple, reliable, and universal metrics of molecular planarity

by Lu, Tian in Characterization and Evaluation of Materials , Chemistry , Chemistry and Materials Science

2021

Planarity is a very important structural character of molecules, which is closely related to many molecular properties. Unfortunately, there is currently no simple, universal, and robust way to measure molecular planarity. In order to fill this evident gap, we propose two metrics of molecular planarity, namely molecular planarity parameter (MPP) and span of deviation from plane (SDP), to quantitatively characterize planarity of molecules. MPP reflects the overall degree of deviation of the structure from a plane, while SDP represents the span of the structural deviation relative to the fitting plane; respectively, they are complementary to each other. The examples in this article demonstrate that these metrics have strong rationality and practicality. They can not only be used to investigate the planarity of the entire molecule, but also measure the planarity of local structures, and they can even be employed to study variation of molecular planarity during a dynamic process. In addition, we also propose a new representation, namely coloring atoms according to their signed deviation distance to the fitting plane. This kind of map allows researchers to intuitively and quickly recognize position of the atoms in the system relative to the fitting plane. It can be seen from the examples that this representation is very useful in graphically exhibiting molecular planarity. The methods proposed in this work have been implemented in our open-source analysis code Multiwfn, which can be freely obtained via http://sobereva.com/multiwfn . The use is very simple and rich file formats are supported as input file.

Journal Article

Share this book

Add to My Shelf

An open source chemical structure curation pipeline using RDKit

by Atkinson, Francis , Leach, Andrew R. , Bellis, Louisa J. in ChEMBL , Chemistry , Chemistry and Materials Science

2020

Background The ChEMBL database is one of a number of public databases that contain bioactivity data on small molecule compounds curated from diverse sources. Incoming compounds are typically not standardised according to consistent rules. In order to maintain the quality of the final database and to easily compare and integrate data on the same compound from different sources it is necessary for the chemical structures in the database to be appropriately standardised. Results A chemical curation pipeline has been developed using the open source toolkit RDKit. It comprises three components: a Checker to test the validity of chemical structures and flag any serious errors; a Standardizer which formats compounds according to defined rules and conventions and a GetParent component that removes any salts and solvents from the compound to create its parent. This pipeline has been applied to the latest version of the ChEMBL database as well as uncurated datasets from other sources to test the robustness of the process and to identify common issues in database molecular structures. Conclusion All the components of the structure pipeline have been made freely available for other researchers to use and adapt for their own use. The code is available in a GitHub repository and it can also be accessed via the ChEMBL Beaker webservices. It has been used successfully to standardise the nearly 2 million compounds in the ChEMBL database and the compound validity checker has been used to identify compounds with the most serious issues so that they can be prioritised for manual curation.

Journal Article

Share this book

Add to My Shelf

COCONUT online: Collection of Open Natural Products database

by Yirik, Mehmet Aziz , Merseburger, Peter , Sorokina, Maria in Chemistry , Chemistry and Materials Science , Citation Typing Ontology (CiTO) Pilot

2021

Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other in silico applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net .

Journal Article

Share this book

Add to My Shelf

Molecular graph convolutions: moving beyond fingerprints

by Riley, Patrick , Pande, Vijay , McCloskey, Kevin in Animal Anatomy , Chemistry , Chemistry and Materials Science

2016

Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions , a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.

Journal Article

Share this book

Add to My Shelf

Mordred: a molecular descriptor calculator

by Kawashita, Norihito , Takagi, Tatsuya , Moriwaki, Hirotomo in Calculation software , Cheminformatics , Chemistry

2018

Molecular descriptors are widely employed to present molecular characteristics in cheminformatics. Various molecular-descriptor-calculation software programs have been developed. However, users of those programs must contend with several issues, including software bugs, insufficient update frequencies, and software licensing constraints. To address these issues, we propose Mordred, a developed descriptor-calculation software application that can calculate more than 1800 two- and three-dimensional descriptors. It is freely available via GitHub. Mordred can be easily installed and used in the command line interface, as a web application, or as a high-flexibility Python package on all major platforms (Windows, Linux, and macOS). Performance benchmark results show that Mordred is at least twice as fast as the well-known PaDEL-Descriptor and it can calculate descriptors for large molecules, which cannot be accomplished by other software. Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships.

Journal Article

Share this book

Add to My Shelf

van der Waals potential: an important complement to molecular electrostatic potential in studying intermolecular interactions

by Lu, Tian , Chen, Qinxue in Characterization and Evaluation of Materials , Chemistry , Chemistry and Materials Science

2020

Electrostatics and van der Waals (vdW) interactions are two major components of intermolecular weak interactions. Electrostatic potential has been a very popular function in revealing electrostatic interaction between the system under study and other species, while the role of vdW potential was less recognized and has long been ignored. In this paper, we explicitly present definition of vdW potential, describe its implementation details, and demonstrate its important practical values by several examples. We hope this work can arouse researchers’ attention to the vdW potential and promote its application in the studies of weak interactions. Calculation, visualization, and quantitative analysis of the vdW potential have been supported by our freely available code Multiwfn ( http://sobereva.com/multiwfn ).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter