Catalogue Search | MBRL

Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning

by Miasojedow, Błażej , Skoraczyński, Grzegorz , Kitlas, Mateusz in Accessibility , Algorithms , Analysis

2023

Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .

Journal Article

Share this book

Add to My Shelf

SYBA: Bayesian estimation of synthetic accessibility of organic compounds

by Čmelo, Ivan , Kolář, Michal , Voršilák, Milan in Accessibility , Bayesian analysis , Bernoulli naïve Bayes

2020

SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to – 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.

Journal Article

Share this book

Add to My Shelf

Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore

by Chen, Shuan , Jung, Yousung in Accessibility , Analysis , Building-block accessibility

2024

Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation methods offer speed but typically lack integration with actual synthesis routes and building block information. In this work, we introduce BR-SAScore, an enhanced version of SAScore that integrates the available building block information (B) and reaction knowledge (R) from synthesis planning programs into the scoring process. In particular, we differentiate fragments inherent in building blocks and fragments to be derived from synthesis (reactions) when scoring synthetic accessibility. Compared to existing methods, our experimental findings demonstrate that BR-SAScore offers more accurate and precise identification of a molecule's synthetic accessibility by the synthesis planning program with a fast calculation time. Moreover, we illustrate how BR-SAScore provides chemically interpretable results, aligning with the capability of the synthesis planning program embedded with the same reaction knowledge and available building blocks. Scientific contribution We introduce BR-SAScore, an extension of SAScore, to estimate the synthetic accessibility of molecules by leveraging known building-block and reactivity information. In our experiments, BR-SAScore shows superior prediction performance on predicting molecule synthetic accessibility compared to previous methods, including SAScore and deep-learning models, while requiring significantly less computation time. In addition, we show that BR-SAScore is able to precisely identify the chemical fragment contributing to the synthetic infeasibility, holding great potential for future molecule synthesizability optimization.

Journal Article

Share this book

Add to My Shelf

eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates

by Wu, Hsiao-Chun , Naderi, Misagh , Liu, Tairan in Accessibility , Algorithms , Alzheimer's disease

2019

Background The efficiency of drug development defined as a number of successfully launched new pharmaceuticals normalized by financial investments has significantly declined. Nonetheless, recent advances in high-throughput experimental techniques and computational modeling promise reductions in the costs and development times required to bring new drugs to market. The prediction of toxicity of drug candidates is one of the important components of modern drug discovery. Results In this work, we describe e ToxPred, a new approach to reliably estimate the toxicity and synthetic accessibility of small organic compounds. e ToxPred employs machine learning algorithms trained on molecular fingerprints to evaluate drug candidates. The performance is assessed against multiple datasets containing known drugs, potentially hazardous chemicals, natural products, and synthetic bioactive compounds. Encouragingly, e ToxPred predicts the synthetic accessibility with the mean square error of only 4% and the toxicity with the accuracy of as high as 72%. Conclusions e ToxPred can be incorporated into protocols to construct custom libraries for virtual screening in order to filter out those drug candidates that are potentially toxic or would be difficult to synthesize. It is freely available as a stand-alone software at https://github.com/pulimeng/etoxpred .

Journal Article

Share this book

Add to My Shelf

LEADD: Lamarckian evolutionary algorithm for de novo drug design

by Kerstjens, Alan , De Winter, Hans in Accessibility , Algorithms , Chemistry

2022

Given an objective function that predicts key properties of a molecule, goal-directed de novo molecular design is a useful tool to identify molecules that maximize or minimize said objective function. Nonetheless, a common drawback of these methods is that they tend to design synthetically unfeasible molecules. In this paper we describe a Lamarckian evolutionary algorithm for de novo drug design (LEADD). LEADD attempts to strike a balance between optimization power, synthetic accessibility of designed molecules and computational efficiency. To increase the likelihood of designing synthetically accessible molecules, LEADD represents molecules as graphs of molecular fragments, and limits the bonds that can be formed between them through knowledge-based pairwise atom type compatibility rules. A reference library of drug-like molecules is used to extract fragments, fragment preferences and compatibility rules. A novel set of genetic operators that enforce these rules in a computationally efficient manner is presented. To sample chemical space more efficiently we also explore a Lamarckian evolutionary mechanism that adapts the reproductive behavior of molecules. LEADD has been compared to both standard virtual screening and a comparable evolutionary algorithm using a standardized benchmark suite and was shown to be able to identify fitter molecules more efficiently. Moreover, the designed molecules are predicted to be easier to synthesize than those designed by other evolutionary algorithms. Graphical Abstract

Journal Article

Share this book

Add to My Shelf

DeepSA: a deep-learning driven predictor of compound synthesis accessibility

by Wang, Shihang , Wang, Lin , Bai, Fang in Accessibility , Algorithms , Analysis

2023

With the continuous development of artificial intelligence technology, more and more computational models for generating new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based computational model called DeepSA, was proposed to predict the synthesis accessibility of compounds, which provides a useful tool to choose molecules. DeepSA is a chemical language model that was developed by training on a dataset of 3,593,053 molecules using various natural language processing (NLP) algorithms, offering advantages over state-of-the-art methods and having a much higher area under the receiver operating characteristic curve (AUROC), i.e., 89.6%, in discriminating those molecules that are difficult to synthesize. This helps users select less expensive molecules for synthesis, reducing the time and cost required for drug discovery and development. Interestingly, a comparison of DeepSA with a Graph Attention-based method shows that using SMILES alone can also efficiently visualize and extract compound’s informative features. DeepSA is available online on the below web server ( https://bailab.siais.shanghaitech.edu.cn/services/deepsa/ ) of our group, and the code is available at https://github.com/Shihang-Wang-58/DeepSA .

Journal Article

Share this book

Add to My Shelf

Modeling of synthetic accessibility of potential drug molecules containing five-membered aromatic heterocycles

by Bondarev, N , Palyulin, V. A , Ivanenkov, Ya. A in Accessibility , Algorithms , Automation

2025

A new cheminformatics approach was proposed for modeling synthetic accessibility (SA), which is one of the most critical problems in computer-aided design of novel drugs. The approach is based on filtering molecular structures that contain synthetically irrelevant substructures using the hierarchically organized library of SMARTS substructures, which considers the fragment’s local environment. It compensates for the drawbacks of the existing SA modeling methods. The proposed approach was validated using five-membered aromatic heterocycles. Applying this algorithm significantly reduces the number of false positive predictions commonly encountered when employing traditional SA modeling methods. In its current form, the proposed algorithm can serve as an efficient additional filtering layer. A hierarchical substructure filtering algorithm has been described for the first time, along with a novel method for the automated merging of SMARTS patterns based on atomic primitives.

Journal Article

Share this book

Add to My Shelf

RetroScore: graph edit distance-guided retrosynthesis for accessibility scoring with route metrics

by Lin, Jianping , Zhou, Xiaofei , Gao, Sinuo in Accessibility , Algorithms , Analysis

2025

Molecular generation is a critical method in drug design, but its practical application is often limited by the difficulty of synthesizing the generated molecules. To address this challenge, we present RetroScore, a synthetic accessibility evaluation framework guided by multistep retrosynthesis. Our methodology integrates the semi-template model Graph2Edits with the multistep retrosynthesis planning algorithm Retro*, forming the Graph2Edits-Retro*d system. By incorporating the green chemistry metric of graph edit distance into the reaction cost function and a multistage screening protocol, this system identifies optimal routes while balancing reliability, synthetic efficiency, and economic feasibility. Benchmark evaluations demonstrate a 97.37% planning success rate with balanced optimization across route length, confidence score, and graph edit distance. In the molecular generation task, the RetroScore outperforms six of the seven synthetic accessibility metrics, yielding molecules with enhanced synthetic accessibility profiles across heterogeneous evaluation frameworks. To facilitate practical implementation, we developed an open-access web platform for automated retrosynthesis route prediction and RetroScore calculation, providing researchers with rapid synthetic accessibility assessments. The RetroScore web server is publicly accessible at http://aidd.bioai-global.com/RetroScore/ , and the source code is available at https://github.com/Snowgao320/RetroScore . Scientific contribution In this study, we present RetroScore, a novel multistep retrosynthesis planning-guided framework for evaluating molecular synthetic accessibility. The framework introduces a novel scoring function that incorporates graph edit distance to explicitly optimize for atom economy (a green chemistry estimator) in addition to route reliability and length, marking a significant advance over prior synthetic accessibility (SA) metrics focused on molecular complexity or fragment-based approaches. To systematically identify optimal synthesis routes, a multistage screening protocol is employed that concurrently balances reliability (confidence score), synthetic efficiency (route length), and economic feasibility (graph edit distance), thereby addressing a critical limitation of single-objective retrosynthesis planning algorithms. In the experiments, RetroScore achieved a 97.37% success rate in multistep retrosynthesis planning while successfully optimizing across these competing objectives. In subsequent molecular generation tasks, it outperformed six of seven established SA metrics, indicating its effectiveness in producing novel molecules with superior synthetic profiles.

Journal Article

Share this book

Add to My Shelf

MolPrice: assessing synthetic accessibility of molecules based on market value

by Hastedt, Friedrich , Hellgardt, Klaus , del Rio Chanona, Antonio in Accessibility , Algorithms , Benchmarks

2025

Machine learning approaches for conceptualizing and designing in silico compounds have attracted significant attention. However, the applicability of these compounds is often challenged by synthetic viability and cost-effectiveness. Researchers introduced proxy-scores, known as synthethic accessiblity scoring, to quantify the ease of synthesis for virtual molecules. Despite their utility, existing synthetic accessibility tools have notable limitations: they overlook compound purchasability, lack physical interpretability, and often rely on imperfect computer-aided synthesis planning algorithms. We introduce MolPrice , an accurate and fast model for molecular price prediction. Utilizing self-supervised contrastive learning, MolPrice autonomously generates price labels for synthetically complex molecules, enabling the model to generalize to molecules beyond the training distribution. Our results show that MolPrice reliably assigns higher prices to synthetically complex molecules than to readily purchasable ones, effectively distinguishing different levels of synthetic accessibility. Furthermore, MolPrice achieves competitive performance on literature benchmarks for synthetic accessibility. To demonstrate its practical utility, we conduct a virtual screening case study, illustrating how MolPrice successfully identifies purchasable molecules from a large candidate library. MolPrice bridges the gap between generative molecular design and real-world feasibility by integrating cost-awareness into synthetic accessibility assessment, making it a powerful model to accelerate molecular discovery. Scientific contribution We introduce MolPrice , a machine learning model that predicts molecular price as a proxy for synthetic accessibility. Unlike existing approaches, MolPrice integrates cost-awareness into accessibility assessment, allowing it to distinguish readily purchasable molecules from synthetically complex ones. The model is computationally efficient, and our results demonstrate its suitability for large-scale virtual screening. This work thus provides a practical tool to prioritize cheap and synthetically viable compounds in early stage discovery workflows.

Journal Article

Share this book

Add to My Shelf

Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration

by Shen, Qing , Lou, Jungang , Chen, Lin in Algorithms , Analysis , Bioinformatics

2023

The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter