Catalogue Search | MBRL

Cheminformatics and its applications

by Stefaniu, Amalia, editor , Rasul, Azhar, editor , Hussain, Ghulam, (Scientist), editor in Cheminformatics. , Chemistry Data processing.

2020

Book

Share this book

Add to My Shelf

Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions

by Varnek, Alexandre , Gimadiev, Timur R. , Baskin, Igor I. in Algorithms , Business metrics , Chemical compounds

2020

Nowadays, the problem of the model’s applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models’ performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several “best” AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.

Journal Article

Share this book

Add to My Shelf

Computational biology and chemistry

by Behzadi, Payam, 1973- editor , Bernabò, Nicola, editor in Computational chemistry. , Cheminformatics. , Computational biology.

2020

The use of computers and software tools in biochemistry (biology) has led to a deep revolution in basic sciences and medicine. Bioinformatics and systems biology are the direct results of this revolution. With the involvement of computers, software tools, and internet services in scientific disciplines comprising biology and chemistry, new terms, technologies, and methodologies appeared and established. Bioinformatic software tools, versatile databases, and easy internet access resulted in the occurrence of computational biology and chemistry. Today, we have new types of surveys and laboratories including 'in silico studies' and 'dry labs' in which bioinformaticians conduct their investigations to gain invaluable outcomes. These features have led to 3-dimensioned illustrations of different molecules and complexes to get a better understanding of nature.

Book

Share this book

Add to My Shelf

Discovery of novel chemical reactions by deep generative recurrent neural network

by Sidorov, Pavel , Varnek, Alexandre , Baskin, Igor I. in 639/638/549 , 639/638/630 , Artificial intelligence

2021

The “creativity” of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that “creative” AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed “SMILES/CGR” strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

Journal Article

Share this book

Add to My Shelf

Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR

by Schneider, Gisbert , Tropsha, Alexander , Varnek, Alexandre in Deep learning

2024

Quantitative structure–activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term ‘deep QSAR’. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.Advances with deep learning, the growth of databases of molecules for virtual screening and improvements in computational power have supported the emergence of a new field of quantitative structure–activity relationship (QSAR) modelling applications that Tropsha et al. term ‘deep QSAR’. This article discusses key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning, and the use of deep QSAR models in structure-based virtual screening.

Journal Article

Share this book

Add to My Shelf

VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

by Swainston, Neil , O’Hagan, Steve , Roberts, Timothy J. in Algorithms , cheminformatics , Cheminformatics - methods

2020

Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.

Journal Article

Share this book

Add to My Shelf

Artificial intelligence for natural product drug discovery

by Guyomard, Pierre , Gorostiola González, Marina , Elsayed, Somayah S in Artificial intelligence , Biological activity , Natural products

2023

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.Advances in computational omics technologies are enabling access to the hidden diversity of natural products, and artificial intelligence approaches are facilitating key steps in harnessing the therapeutic potential of such compounds, including biological activity prediction. This article discusses synergies between these fields to effectively identify drug candidates from the plethora of molecules produced by nature, and how to address the challenges in realizing the potential of these synergies.

Journal Article

Share this book

Add to My Shelf

Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

by van Vlijmen, Herman W. T. , Papadatos, George , Kowalczyk, Wojtek in 7th Joint Sheffield Conference on Cheminformatics , Artificial neural networks , Bayesian analysis

2017

The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method (‘DNN_PCM’) performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized ‘DNN_PCM’). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

Journal Article

Share this book

Add to My Shelf

A Computational Approach to Predictive Modeling Using Connection-Based Topological Descriptors: Applications in Coumarin Anti-Cancer Drug Properties

by Hayat, Sakander , Wazzan, Suha in Antimitotic agents , Antineoplastic agents , Antineoplastic Agents - chemistry

2025

Cheminformatics bridges chemistry, computer science, and information technology to predict chemical behaviors using quantitative structure–property relationships (QSPRs). This study advances QSPR modeling by introducing novel connection-based graphical invariants, specifically designed to enhance the predictive accuracy for physicochemical properties (PCPs) of benzenoid hydrocarbons (BHs). Employing cutting-edge computational methods, we evaluate these invariants against established descriptors in modeling the normal boiling point and standard heat of formation. The findings reveal superior predictive performance by newly proposed invariants, such as the sum-connectivity connection index, outperforming traditional indices like the Zagreb connection indices. Furthermore, we extend these methods to model the physicochemical properties of coumarin-related anti-cancer drugs, demonstrating their potential in drug development. The statistical analysis suggests that the most appropriate structure–property models are nonlinear. This work not only proposes robust tools for PCP estimation but also advocates for rigorous testing of descriptors to ensure relevance in cheminformatics.

Journal Article

Share this book

Add to My Shelf

QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery

by Melo-Filho, Cleber C. , Andrade, Carolina Horta , Neves, Bruno J. in Best practice , Biological activity , cheminformatics

2018

Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small molecules for new hits with desired properties that can then be tested experimentally. Similar to other computational approaches, VS intention is not to replace or assays, but to speed up the discovery process, to reduce the number of candidates to be tested experimentally, and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and labor-saving. Among the VS approaches, quantitative structure-activity relationship (QSAR) analysis is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chemical descriptors are calculated on different levels of representation of molecular structure, ranging from 1D to D, and then correlated with the biological property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biological property of novel compounds. Although the experimental testing of computational hits is not an inherent part of QSAR methodology, it is highly desired and should be performed as an ultimate validation of developed models. In this mini-review, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compounds with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter