Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
15,038
result(s) for
"Theoretical and Computational Chemistry"
Sort by:
GNINA 1.0: molecular docking with deep learning
by
Koes, David Ryan
,
Masuda, Tomohide
,
Meli, Rocco
in
Accuracy
,
Algorithms
,
Artificial neural networks
2021
Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking.
Gnina
, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of
Gnina
under an open source license for use as a molecular docking tool at
https://github.com/gnina/gnina
.
Journal Article
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
2021
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.
Journal Article
Molecular representations in AI-driven drug discovery: a review and practical guide
by
Engkvist, Ola
,
Mercado, Rocío
,
David, Laurianne
in
Big Data in Chemistry
,
Cheminformatics
,
Chemistry
2020
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Journal Article
An open source chemical structure curation pipeline using RDKit
by
Atkinson, Francis
,
Leach, Andrew R.
,
Bellis, Louisa J.
in
ChEMBL
,
Chemistry
,
Chemistry and Materials Science
2020
Background
The ChEMBL database is one of a number of public databases that contain bioactivity data on small molecule compounds curated from diverse sources. Incoming compounds are typically not standardised according to consistent rules. In order to maintain the quality of the final database and to easily compare and integrate data on the same compound from different sources it is necessary for the chemical structures in the database to be appropriately standardised.
Results
A chemical curation pipeline has been developed using the open source toolkit RDKit. It comprises three components: a
Checker
to test the validity of chemical structures and flag any serious errors; a
Standardizer
which formats compounds according to defined rules and conventions and a
GetParent
component that removes any salts and solvents from the compound to create its parent. This pipeline has been applied to the latest version of the ChEMBL database as well as uncurated datasets from other sources to test the robustness of the process and to identify common issues in database molecular structures.
Conclusion
All the components of the structure pipeline have been made freely available for other researchers to use and adapt for their own use. The code is available in a GitHub repository and it can also be accessed via the ChEMBL Beaker webservices. It has been used successfully to standardise the nearly 2 million compounds in the ChEMBL database and the compound validity checker has been used to identify compounds with the most serious issues so that they can be prioritised for manual curation.
Journal Article
Mordred: a molecular descriptor calculator
by
Kawashita, Norihito
,
Takagi, Tatsuya
,
Moriwaki, Hirotomo
in
Calculation software
,
Cheminformatics
,
Chemistry
2018
Molecular descriptors are widely employed to present molecular characteristics in cheminformatics. Various molecular-descriptor-calculation software programs have been developed. However, users of those programs must contend with several issues, including software bugs, insufficient update frequencies, and software licensing constraints. To address these issues, we propose Mordred, a developed descriptor-calculation software application that can calculate more than 1800 two- and three-dimensional descriptors. It is freely available via GitHub. Mordred can be easily installed and used in the command line interface, as a web application, or as a high-flexibility Python package on all major platforms (Windows, Linux, and macOS). Performance benchmark results show that Mordred is at least twice as fast as the well-known PaDEL-Descriptor and it can calculate descriptors for large molecules, which cannot be accomplished by other software. Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships.
Journal Article
Molecular de-novo design through deep reinforcement learning
by
Olivecrona, Marcus
,
Engkvist, Ola
,
Blaschke, Thomas
in
Celecoxib
,
Chemistry
,
Chemistry and Materials Science
2017
This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.
Graphical abstract
.
Journal Article
Review on natural products databases: where to find data in 2020
2020
Natural products (NPs) have been the centre of attention of the scientific community in the last decencies and the interest around them continues to grow incessantly. As a consequence, in the last 20 years, there was a rapid multiplication of various databases and collections as generalistic or thematic resources for NP information. In this review, we establish a complete overview of these resources, and the numbers are overwhelming: over 120 different NP databases and collections were published and re-used since 2000. 98 of them are still somehow accessible and only 50 are open access. The latter include not only databases but also big collections of NPs published as supplementary material in scientific publications and collections that were backed up in the ZINC database for commercially-available compounds. Some databases, even published relatively recently are already not accessible anymore, which leads to a dramatic loss of data on NPs. The data sources are presented in this manuscript, together with the comparison of the content of open ones. With this review, we also compiled the open-access natural compounds in one single dataset a COlleCtion of Open NatUral producTs (COCONUT), which is available on Zenodo and contains structures and sparse annotations for over 400,000 non-redundant NPs, which makes it the biggest open collection of NPs available to this date.
Journal Article
Simple, reliable, and universal metrics of molecular planarity
by
Lu, Tian
in
Characterization and Evaluation of Materials
,
Chemistry
,
Chemistry and Materials Science
2021
Planarity is a very important structural character of molecules, which is closely related to many molecular properties. Unfortunately, there is currently no simple, universal, and robust way to measure molecular planarity. In order to fill this evident gap, we propose two metrics of molecular planarity, namely molecular planarity parameter (MPP) and span of deviation from plane (SDP), to quantitatively characterize planarity of molecules. MPP reflects the overall degree of deviation of the structure from a plane, while SDP represents the span of the structural deviation relative to the fitting plane; respectively, they are complementary to each other. The examples in this article demonstrate that these metrics have strong rationality and practicality. They can not only be used to investigate the planarity of the entire molecule, but also measure the planarity of local structures, and they can even be employed to study variation of molecular planarity during a dynamic process. In addition, we also propose a new representation, namely coloring atoms according to their signed deviation distance to the fitting plane. This kind of map allows researchers to intuitively and quickly recognize position of the atoms in the system relative to the fitting plane. It can be seen from the examples that this representation is very useful in graphically exhibiting molecular planarity. The methods proposed in this work have been implemented in our open-source analysis code Multiwfn, which can be freely obtained via
http://sobereva.com/multiwfn
. The use is very simple and rich file formats are supported as input file.
Journal Article
COCONUT online: Collection of Open Natural Products database
by
Yirik, Mehmet Aziz
,
Merseburger, Peter
,
Sorokina, Maria
in
Chemistry
,
Chemistry and Materials Science
,
Citation Typing Ontology (CiTO) Pilot
2021
Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other
in silico
applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at
https://coconut.naturalproducts.net
.
Journal Article
Adsorption of phenylacetylene and styrene on palladium surface: a DFT study
by
Finkelshtein, Eugene I.
,
Shamsiev, Ravshan S.
in
11th European Conference on Theoretical and Computational Chemistry (EuCO-TCC 2017)
,
Adsorption
,
Characterization and Evaluation of Materials
2018
In this paper, a DFT study of phenylacetylene and styrene interactions with different surfaces ({111}, {100}, edge and corner) of Pd
86
cluster was performed. The results obtained show that the interaction of phenylacetylene with Pd{111} or Pd{100} surfaces is stronger than that of styrene, but on the edges of Pd
86
the adsorption of styrene is more preferable. The results agree with experimental observations, namely, with the nanoparticle size effect in the PhA semihydrogenation on Pd catalysts.
Journal Article