Catalogue Search | MBRL

SYBA: Bayesian estimation of synthetic accessibility of organic compounds

by Čmelo, Ivan , Kolář, Michal , Voršilák, Milan in Accessibility , Bayesian analysis , Bernoulli naïve Bayes

2020

SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to – 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.

Journal Article

Share this book

Add to My Shelf

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

by Bender, Andreas , Svozil, Daniel , Cortés-Ciriano, Isidro in Affinity , Affinity fingerprints , Big Data in Chemistry

2020

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K i , K d , IC 50 and EC 50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC 50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC 50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC 50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC 50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

Journal Article

Share this book

Add to My Shelf

InCHlib – interactive cluster heatmap for web applications

by Bartůněk, Petr , Svozil, Daniel , Škuta, Ctibor in Application programming interface , Big data , Chemistry

2014

Background Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called ‘cluster heatmap’ is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. Results We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . Conclusions The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.

Journal Article

Share this book

Add to My Shelf

Molpher: a software framework for systematic chemical space exploration

by Voršilák, Milan , Svozil, Daniel , Škoda, Petr in Algorithms , Chemistry , Chemistry and Materials Science

2014

Background Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology. Results In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term ‘molecular morphing’, Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called ‘morphing operators’ that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines. Conclusions Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.

Journal Article

Share this book

Add to My Shelf

Probes & Drugs portal: an interactive, open data resource for chemical biology

by Skuta, Ctibor , Jindrich, Jindrich , Popr, Martin in 49/105 , 631/114/129 , 631/154

2017

Journal Article

Share this book

Add to My Shelf

Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of α/ γ Conformers

by Orozco, Modesto , Svozil, Daniel , Marchán, Iván in Biophysical Theory and Modeling , Computer Simulation , DNA - chemistry

2007

We present here the parmbsc0 force field, a refinement of the AMBER parm99 force field, where emphasis has been made on the correct representation of the α/ γ concerted rotation in nucleic acids (NAs). The modified force field corrects overpopulations of the α/ γ = (g+,t) backbone that were seen in long (more than 10 ns) simulations with previous AMBER parameter sets (parm94-99). The force field has been derived by fitting to high-level quantum mechanical data and verified by comparison with very high-level quantum mechanical calculations and by a very extensive comparison between simulations and experimental data. The set of validation simulations includes two of the longest trajectories published to date for the DNA duplex (200 ns each) and the largest variety of NA structures studied to date (15 different NA families and 97 individual structures). The total simulation time used to validate the force field includes near 1 μs of state-of-the-art molecular dynamics simulations in aqueous solution.

Journal Article

Share this book

Add to My Shelf

Nonpher: computational method for design of hard-to-synthesize structures

by Voršilák, Milan , Svozil, Daniel in Biological activity , Chemical compounds , Chemistry

2017

In cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can’t be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.

Journal Article

Share this book

Add to My Shelf

Building community trust and stewardship through a multi-indicator approach to monitoring aquatic ecosystem restoration

by Powell, Megan , Scanes, Peter , Potts, Jaimie

2025

Journal Article

Share this book

Add to My Shelf

mmView: a web-based viewer of the mmCIF format

by Čech, Petr , Svozil, Daniel in Analysis , Biomedical and Life Sciences , Biomedicine

2011

Background Structural biomolecular data are commonly stored in the PDB format. The PDB format is widely supported by software vendors because of its simplicity and readability. However, the PDB format cannot fully address many informatics challenges related to the growing amount of structural data. To overcome the limitations of the PDB format, a new textual format mmCIF was released in June 1997 in its version 1.0. mmCIF provides extra information which has the advantage of being in a computer readable form. However, this advantage becomes a disadvantage if a human must read and understand the stored data. While software tools exist to help to prepare mmCIF files, the number of available systems simplifying the comprehension and interpretation of the mmCIF files is limited. Findings In this paper we present mmView - a cross-platform web-based application that allows to explore comfortably the structural data of biomacromolecules stored in the mmCIF format. The mmCIF categories can be easily browsed in a tree-like structure, and the corresponding data are presented in a well arranged tabular form. The application also allows to display and investigate biomolecular structures via an integrated Java application Jmol. Conclusions The mmView software system is primarily intended for educational purposes, but it can also serve as a useful research tool. The mmView application is offered in two flavors: as an open-source stand-alone application (available from http://sourceforge.net/projects/mmview ) that can be installed on the user's computer, and as a publicly available web server.

Journal Article

Share this book

Add to My Shelf

sCMOS noise-correction algorithm for microscopy images

by Liu, Sheng , Mlodzianoski, Michael J , Huang, Fang in 14/63 , 631/114/1564 , 631/1647/245/2225

2017

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter