Catalogue Search | MBRL

Randomized SMILES strings improve the quality of molecular generative models

by Reymond, Jean-Louis , Engkvist, Ola , Bjerrum, Esben Jannik in Artificial neural networks , Benchmarking , Benchmarks

2019

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.

Journal Article

Share this book

Add to My Shelf

The multivariate mixture dynamics model: shifted dynamics and correlation skew

by Rapisarda, Francesco , Brigo, Damiano , Pisani, Camilla in Arbitrage , Calibration , Correlation

2021

The multi variate mixture dynamics model is a tractable, dynamical, arbitrage-free multivariate model characterized by transparency on the dependence structure, since closed form formulae for terminal correlations, average correlations and copula function are available. It also allows for complete decorrelation between assets and instantaneous variances. Each single asset is modelled according to a lognormal mixture dynamics model, and this univariate version is widely used in the industry due to its flexibility and accuracy. The same property holds for the multivariate process of all assets, whose density is a mixture of multivariate basic densities. This allows for consistency of single asset and index/portfolio smile. In this paper, we generalize the MVMD model by introducing shifted dynamics and we propose a definition of implied correlation under this model. We investigate whether the model is able to consistently reproduce the implied volatility of FX cross rates once the single components are calibrated to univariate shifted lognormal mixture dynamics models. We consider in particular the case of the Chinese Renminbi FX rate, showing that the shifted MVMD model correctly recovers the CNY/EUR smile given the EUR/USD smile and the USD/CNY smile, thus highlighting that the model can also work as an arbitrage free volatility smile extrapolation tool for cross currencies that may not be liquid or fully observable. We compare the performance of the shifted MVMD model in terms of implied correlation with those of the shifted simply correlated mixture dynamics model where the dynamics of the single assets are connected naively by introducing correlation among their Brownian motions. Finally, we introduce a model with uncertain volatilities and correlation. The Markovian projection of this model is a generalization of the shifted MVMD model.

Journal Article

Share this book

Add to My Shelf

POETRY: A Smile for Strange Fruit

by Lateef Mc Leod in Smiles

2012

Journal Article

Share this book

Add to My Shelf

SMILES-based deep generative scaffold decorator for de-novo drug design

by Reymond, Jean-Louis , Engkvist, Ola , Bjerrum, Esben Jannik in Algorithms , Architecture , Artificial neural networks

2020

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

Journal Article

Share this book

Add to My Shelf

BIOPEP-UWM Database of Bioactive Peptides: Current Opportunities

by Iwaniak, Anna , Darewicz, Małgorzata , Minkiewicz, Piotr in Amino Acid Sequence , Amino acids , Amino Acids - chemistry

2019

The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.

Journal Article

Share this book

Add to My Shelf

A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction

by Öztürk, Hakime , Özgür, Arzucan , Ozkirimli, Elif in Algorithms , Bioinformatics , Biomedical and Life Sciences

2016

Background Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. Results In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. Conclusion The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.

Journal Article

Share this book

Add to My Shelf

SOMEDAY YOU'LL FIND ME: A Plan B Essay

by Pearlman, Edith in Smiles

2010

Journal Article

Share this book

Add to My Shelf

Eis-me aqui, Africa/Here I am, Africa

by Fonseca, Mário in Smiles

2010

Magazine Article

Share this book

Add to My Shelf

Evaluation of aesthetics of posed smiles based on smile-related characteristics

by Zhou, Guilong , Zhao, Jinlong , Tian, Lei in Adult , Aesthetics , China

2025

Purpose The purpose of this study was to investigate the aesthetics evaluation of four smile-related characteristics among different genders and professional subgroups, including dental professionals (DPs), non-dental healthcare professionals (NDPs), and laypersons (LPs). Methods Smile photographs were selected and digitally manipulated to determine changes in various smile aesthetic parameters (lip thickness ratio, smile line/smile index, upper lip curvature, and smile arc/dental curvature). These altered images were rated by Chinese participants (dental professionals, non-dental healthcare professionals, and laypersons). A total of 1469 subjects were recruited to complete the questionnaire. Smile aesthetics ratings were calculated, and comparisons between groups were made. Results All respondents chose 1:1.5 lip thickness ratio, average smile line, upward upper lip curvature, and upward dental curvature (consonant smile arc) parallel to the lower lip curvature smile arc as the most attractive. Dental professionals (DPs) more focus on smile aesthetics compared to the others( p < 0.01). Significant differences were detected in the perception of smile-related characteristics across gender and professional subgroups( p < 0.05). In addition, there were significant differences in the attractiveness ratings for smiles among professional subgroups( p < 0.05). The most important factor influencing smile aesthetics in the present study was smile arc. Conclusion The smile-related characteristics of the smile, such as the lip thickness ratio, smile line, upper lip curvature, and smile arc are predominant factors influencing smile attractiveness and should be given priority when considering and managing aesthetic treatment plans. Females and DPs are more critical of smile aesthetics, and DPs are also focused more on smile aesthetics than laypersons. So it is necessary to account for the influence of gender and profession on personal evaluation and treatment plans.

Journal Article

Share this book

Add to My Shelf

Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES

by Margreitter, Christian , Kolarova, Simona , Bjerrum, Esben Jannik in Computer programming , Deep learning , Optimization

2023

Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter