Catalogue Search | MBRL

Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments

by Ko, Junsu , Ashyrmamatov, Islambek , Lee, Juyong in 639/638/403 , 639/638/549 , 639/638/630

2022

Designing efficient synthetic routes for a target molecule remains a major challenge in organic synthesis. Atom environments are ideal, stand-alone, chemically meaningful building blocks providing a high-resolution molecular representation. Our approach mimics chemical reasoning, and predicts reactant candidates by learning the changes of atom environments associated with the chemical reaction. Through careful inspection of reactant candidates, we demonstrate atom environments as promising descriptors for studying reaction route prediction and discovery. Here, we present a new single-step retrosynthesis prediction method, viz. RetroTRAE, being free from all SMILES-based translation issues, yields a top-1 accuracy of 58.3% on the USPTO test dataset, and top-1 accuracy reaches to 61.6% with the inclusion of highly similar analogs, outperforming other state-of-the-art neural machine translation-based methods. Our methodology introduces a novel scheme for fragmental and topological descriptors to be used as natural inputs for retrosynthetic prediction tasks. Reaction route planning remains a major challenge in organic synthesis. The authors present a retrosynthetic prediction model using the fragment-based representation of molecules and the Transformer architecture in neural machine translation.

Journal Article

Share this book

Add to My Shelf

AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks

by Ko, Junsu , Kwon, Yongbeom , Lee, Juyong in Computer-Aided Design , Databases, Protein , Deep Learning

2020

Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. Therefore, many binding affinity prediction methods have been developed. In recent years, since deep learning technology has become powerful, it is also implemented to predict affinity. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3-D convolutional neural network layers. Our model was trained using the 3772 protein-ligand complexes from the refined set of the PDBbind-2016 database and tested using the core set of 285 complexes. The benchmark results show that the Pearson correlation coefficient between the predicted binding affinities by our model and the experimental data is 0.827, which is higher than the state-of-the-art binding affinity prediction scoring functions. Additionally, our method ranks the relative binding affinities of possible multiple binders of a protein quite accurately, comparable to the other scoring functions. Last, we measured which structural information is critical for predicting binding affinity and found that the complementarity between the protein and ligand is most important.

Journal Article

Share this book

Add to My Shelf

A-Prot: protein structure modeling using MSA transformer

by Ko, Junsu , Hong, Yiyu , Lee, Juyong in Algorithms , Bioinformatics , Biomedical and Life Sciences

2022

Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods.

Journal Article

Share this book

Add to My Shelf

GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions

by Ko, Junsu , Seok, Chaok , Park, Hahnbeom in Algorithms , Bioinformatics , Biomedical and Life Sciences

2012

Background Protein structures can be reliably predicted by template-based modeling (TBM) when experimental structures of homologous proteins are available. However, it is challenging to obtain structures more accurate than the single best templates by either combining information from multiple templates or by modeling regions that vary among templates or are not covered by any templates. Results We introduce GalaxyTBM, a new TBM method in which the more reliable core region is modeled first from multiple templates and less reliable, variable local regions, such as loops or termini, are then detected and re-modeled by an ab initio method. This TBM method is based on “Seok-server,” which was tested in CASP9 and assessed to be amongst the top TBM servers. The accuracy of the initial core modeling is enhanced by focusing on more conserved regions in the multiple-template selection and multiple sequence alignment stages. Additional improvement is achieved by ab initio modeling of up to 3 unreliable local regions in the fixed framework of the core structure. Overall, GalaxyTBM reproduced the performance of Seok-server, with GalaxyTBM and Seok-server resulting in average GDT-TS of 68.1 and 68.4, respectively, when tested on 68 single-domain CASP9 TBM targets. For application to multi-domain proteins, GalaxyTBM must be combined with domain-splitting methods. Conclusion Application of GalaxyTBM to CASP9 targets demonstrates that accurate protein structure prediction is possible by use of a multiple-template-based approach, and ab initio modeling of variable regions can further enhance the model quality.

Journal Article

Share this book

Add to My Shelf

Improving docking and virtual screening performance using AlphaFold2 multi-state modeling for kinases

by Ha, Junsu , Ko, Junsu , Lee, Juyong in Accuracy , AlphaFold2 , Benchmarks

2024

Structure-based virtual screening (SBVS) is a crucial computational approach in drug discovery, but its performance is sensitive to structural variations. Kinases, which are major drug targets, exemplify this challenge due to active site conformational changes caused by different inhibitor types. Most experimentally determined kinase structures have the DFGin state, potentially biasing SBVS towards type I inhibitors and limiting the discovery of diverse scaffolds. We introduce a multi-state modeling (MSM) protocol for AlphaFold2 (AF2) kinase structures using state-specific templates to address these challenges. Our comprehensive benchmarks evaluate predicted model qualities, binding pose prediction accuracy, and hit compound identification through ensemble SBVS. Results demonstrate that MSM models exhibit comparable or improved structural accuracy compared to standard AF2 models, enhancing pose prediction accuracy and effectively capturing kinase-ligand interactions. In virtual screening experiments, our MSM approach consistently outperforms standard AF2 and AF3 modeling, particularly in identifying diverse hit compounds. This study highlights the potential of MSM in broadening kinase inhibitor discovery by facilitating the identification of chemically diverse inhibitors, offering a promising solution to the structural bias problem in kinase-targeted drug discovery.

Journal Article

Share this book

Add to My Shelf

Substructure-based neural machine translation for retrosynthetic prediction

by Ko, Junsu , Kang, Taek , Lee, Juyong in Attention , Chemical reactions , Chemistry

2021

With the rapid improvement of machine translation approaches, neural machine translation has started to play an important role in retrosynthesis planning, which finds reasonable synthetic pathways for a target molecule. Previous studies showed that utilizing the sequence-to-sequence frameworks of neural machine translation is a promising approach to tackle the retrosynthetic planning problem. In this work, we recast the retrosynthetic planning problem as a language translation problem using a template-free sequence-to-sequence model. The model is trained in an end-to-end and a fully data-driven fashion. Unlike previous models translating the SMILES strings of reactants and products, we introduced a new way of representing a chemical reaction based on molecular fragments. It is demonstrated that the new approach yields better prediction results than current state-of-the-art computational methods. The new approach resolves the major drawbacks of existing retrosynthetic methods such as generating invalid SMILES strings. Specifically, our approach predicts highly similar reactant molecules with an accuracy of 57.7%. In addition, our method yields more robust predictions than existing methods.

Journal Article

Share this book

Add to My Shelf

S-Pred: protein structural property prediction using MSA transformer

by Ko, Junsu , Hong, Yiyu , Lee, Juyong in 631/114/1305 , 631/114/2411 , 631/45/535

2022

Predicting the local structural features of a protein from its amino acid sequence helps its function prediction to be revealed and assists in three-dimensional structural modeling. As the sequence-structure gap increases, prediction methods have been developed to bridge this gap. Additionally, as the size of the structural database and computing power increase, the performance of these methods have also significantly improved. Herein, we present a powerful new tool called S-Pred, which can predict eight-state secondary structures (SS8), accessible surface areas (ASAs), and intrinsically disordered regions (IDRs) from a given sequence. For feature prediction, S-Pred uses multiple sequence alignment (MSA) of a query sequence as an input. The MSA input is converted to features by the MSA Transformer, which is a protein language model that uses an attention mechanism. A long short-term memory (LSTM) was employed to produce the final prediction. The performance of S-Pred was evaluated on several test sets, and the program consistently provided accurate predictions. The accuracy of the SS8 prediction was approximately 76%, and the Pearson’s correlation between the experimental and predicted ASAs was 0.84. Additionally, an IDR could be accurately predicted with an F1-score of 0.514. The program is freely available at https://github.com/arontier/S_Pred_Paper and https://ad3.io as a code and a web server.

Journal Article

Share this book

Add to My Shelf

Comparative transcriptome analysis identified candidate genes involved in mycelium browning in Lentinula edodes

by Markkandan, Kesavan , Lee, Hwa-Yong , Ji, Sumin in Animal Genetics and Genomics , Biomedical and Life Sciences , Brown film

2019

Background Lentinula edodes is one of the most popular edible mushroom species in the world and contains useful medicinal components, such as lentinan. The light-induced formation of brown film on the vegetative mycelial tissues of L. edodes is an important process for ensuring the quantity and quality of this edible mushroom. To understand the molecular mechanisms underlying this critical developmental process in L. edodes , we characterized the morphological phenotypic changes in a strain, Chamaram, associated with abnormal brown film formation and compared its genome-wide transcriptional features. Results In the present study, we performed genome-wide transcriptome analyses of different vegetative mycelium growth phenotypes, namely, early white, normal brown, and defective dark yellow partial brown films phenotypes which were exposed to different light conditions. The analysis revealed the identification of clusters of genes specific to the light-induced brown film phenotypes. These genes were significantly associated with light sensing via photoreceptors such as FMN- and FAD-bindings, signal transduction by kinases and GPCRs, melanogenesis via activation of tyrosinases, and cell wall degradation by glucanases, chitinases, and laccases, which suggests these processes are involved in the formation of mycelial browning in L. edodes . Interestingly, hydrophobin genes such as SC1 and SC3 exhibited divergent expression levels in the normal and abnormal brown mycelial films, indicating the ability of these genes to act in fruiting body initiation and formation of dikaryotic mycelia. Furthermore, we identified the up-regulation of glycoside hydrolase domain-containing genes in the normal brown film but not in the abnormal film phenotype, suggesting that cell wall degradation in the normal brown film phenotype is crucial in the developmental processes related to the initiation and formation of fruiting bodies. Conclusions This study systematically analysed the expression patterns of light-induced browning-related genes in L. edodes . Our findings provide information for further investigations of browning formation mechanisms in L. edodes and a foundation for future L. edodes breeding.

Journal Article

Share this book

Add to My Shelf

Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

by Ha, Junsu , Chandrasekaran, Ramakrishnan , Lee, Juyong in Affinity , Benchmarks , Binding

2024

We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein–ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein–ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model’s efficiency and generalizability. The model’s efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery. Scientific contribution Our work introduces a novel training strategy for a protein–ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.

Journal Article

Share this book

Add to My Shelf

Intercellular cross-talk through lineage-specific gap junction of cancer-associated fibroblasts related to stromal fibrosis and prognosis

by Kim, Wonkyung , Oh, Ji-Hye , Ko, Junsu in 631/67 , 692/53 , Cancer

2023

Stromal fibrosis in cancer is usually associated with poor prognosis and chemotherapy resistance. It is thought to be caused by fibroblasts; however, the exact mechanism is not yet well understood. The study aimed to identify lineage-specific cancer-associated fibroblast (CAF) subgroup and their associations with extracellular matrix remodeling and clinical significances in various tumor types using single-cell and bulk RNA sequencing data. Through unsupervised clustering, six subclusters of CAFs were identified, including a cluster with exclusively high gap junction protein beta-2 (GJB2) expression. This cluster was named GJB2-positive CAF. It was found to be a unique subgroup of terminally differentiated CAFs associated with collagen gene expression and extracellular matrix remodeling. GJB2-positive CAFs showed higher communication frequency with vascular endothelial cells and cancer cells than GJB2-negative CAFs. Moreover, GJB2 was poorly expressed in normal tissues, indicating that its expression is dependent on interaction with other cells, including vascular endothelial cells and cancer cells. Finally, the study investigated the clinical significance of GJB2 signature score for GJB2-positive CAFs in cancer and found a correlation with poor prognosis. These results suggest that GJB2-positive CAF is a unique fibroblast subtype involved in extracellular matrix remodeling, with significant clinical implications in cancer.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter