Catalogue Search | MBRL

Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments

by Ko, Junsu , Ashyrmamatov, Islambek , Lee, Juyong in 639/638/403 , 639/638/549 , 639/638/630

2022

Designing efficient synthetic routes for a target molecule remains a major challenge in organic synthesis. Atom environments are ideal, stand-alone, chemically meaningful building blocks providing a high-resolution molecular representation. Our approach mimics chemical reasoning, and predicts reactant candidates by learning the changes of atom environments associated with the chemical reaction. Through careful inspection of reactant candidates, we demonstrate atom environments as promising descriptors for studying reaction route prediction and discovery. Here, we present a new single-step retrosynthesis prediction method, viz. RetroTRAE, being free from all SMILES-based translation issues, yields a top-1 accuracy of 58.3% on the USPTO test dataset, and top-1 accuracy reaches to 61.6% with the inclusion of highly similar analogs, outperforming other state-of-the-art neural machine translation-based methods. Our methodology introduces a novel scheme for fragmental and topological descriptors to be used as natural inputs for retrosynthetic prediction tasks. Reaction route planning remains a major challenge in organic synthesis. The authors present a retrosynthetic prediction model using the fragment-based representation of molecules and the Transformer architecture in neural machine translation.

Journal Article

Share this book

Add to My Shelf

AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks

by Ko, Junsu , Kwon, Yongbeom , Lee, Juyong in Computer-Aided Design , Databases, Protein , Deep Learning

2020

Accurate prediction of the binding affinity of a protein-ligand complex is essential for efficient and successful rational drug design. Therefore, many binding affinity prediction methods have been developed. In recent years, since deep learning technology has become powerful, it is also implemented to predict affinity. In this work, a new neural network model that predicts the binding affinity of a protein-ligand complex structure is developed. Our model predicts the binding affinity of a complex using the ensemble of multiple independently trained networks that consist of multiple channels of 3-D convolutional neural network layers. Our model was trained using the 3772 protein-ligand complexes from the refined set of the PDBbind-2016 database and tested using the core set of 285 complexes. The benchmark results show that the Pearson correlation coefficient between the predicted binding affinities by our model and the experimental data is 0.827, which is higher than the state-of-the-art binding affinity prediction scoring functions. Additionally, our method ranks the relative binding affinities of possible multiple binders of a protein quite accurately, comparable to the other scoring functions. Last, we measured which structural information is critical for predicting binding affinity and found that the complementarity between the protein and ligand is most important.

Journal Article

Share this book

Add to My Shelf

A-Prot: protein structure modeling using MSA transformer

by Ko, Junsu , Hong, Yiyu , Lee, Juyong in Algorithms , Bioinformatics , Biomedical and Life Sciences

2022

Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods.

Journal Article

Share this book

Add to My Shelf

Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms

by Manavalan, Balachandran , Lee, Juyong , Lee, Jooyoung in Artificial intelligence , Bioinformatics , Biology and Life Sciences

2014

Recently, predicting proteins three-dimensional (3D) structure from its sequence information has made a significant progress due to the advances in computational techniques and the growth of experimental structures. However, selecting good models from a structural model pool is an important and challenging task in protein structure prediction. In this study, we present the first application of random forest based model quality assessment (RFMQA) to rank protein models using its structural features and knowledge-based potential energy terms. The method predicts a relative score of a model by using its secondary structure, solvent accessibility and knowledge-based potential energy terms. We trained and tested the RFMQA method on CASP8 and CASP9 targets using 5-fold cross-validation. The correlation coefficient between the TM-score of the model selected by RFMQA (TMRF) and the best server model (TMbest) is 0.945. We benchmarked our method on recent CASP10 targets by using CASP8 and 9 server models as a training set. The correlation coefficient and average difference between TMRF and TMbest over 95 CASP10 targets are 0.984 and 0.0385, respectively. The test results show that our method works better in selecting top models when compared with other top performing methods. RFMQA is available for download from http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz.

Journal Article

Share this book

Add to My Shelf

Impact on Predictive Performance of Air Pollutants in PV Forecasting Using Multi-Model Ensemble Learning: Evidence from the Port Logistics Hinterland Area

by Ahn, Jungmin , Lee, Juyong in Air pollution , Electric power production , Electric power systems

2025

The uncertainty of photovoltaic (PV) power generation can impact the stability and flexibility of the power grid. Thus, accurately forecasting PV power output is crucial for ensuring a stable power system and supporting next-generation policy decisions. The purpose of this study is to examine how the PV power generation forecasting model performed both with and without the addition of particulate matter (PM) and greenhouse gas (GHG) concentration factors with meteorological data. In this study, PV power generation is forecasted by models based on various machine learning models. The results indicate that there was no significant difference in forecasting accuracy whether PM and GHG variables were included or not. In addition, the stacked ensemble model has the lowest root mean square error (RMSE) and mean absolute error (MAE) values for all datasets and shows improved performance compared to the single model. Stacked ensemble that include a combination of meteorological, PM, and GHG variables perform the best. However, the optimal datasets varied across models. Therefore, this study concluded that meteorological variables had the greatest influence on the PV generation forecasting performance. Among the additional factors, PM contributed more significantly to the improvement in forecasting performance than GHG.

Journal Article

Share this book

Add to My Shelf

Low Power CMOS-Based Hall Sensor with Simple Structure Using Double-Sampling Delta-Sigma ADC

by Oh, Sein , Oh, Younggyun , Lee, Juyong in delta-sigma ADC , double sampling , hall sensor

2020

A CMOS (Complementary metal-oxide-semiconductor) Hall sensor with low power consumption and simple structure is introduced. The tiny magnetic signal from Hall device could be detected by a high-resolution delta-sigma ADC in presence of offset and flickering noise. Also, the offset as well as the flickering noise are effectively suppressed by the current spinning technique combined with double sampling switches of the ADC. The double sampling scheme of the ADC reduces the operating frequency and helps to reduce the power consumption. The prototype Hall sensor is fabricated in a 0.18-µm CMOS process, and the measurement shows detection range of ±150 mT and sensitivity of 110 µV/mT. The size of active area is 0.7 mm2, and the total power consumption is 4.9 mW. The proposed system is advantageous not only for low power consumption, but also for small sensor size due to its simplicity.

Journal Article

Share this book

Add to My Shelf

Identifying the Hot Spot Residues of the SARS-CoV-2 Main Protease Using MM-PBSA and Multiple Force Fields

by Lee, Juyong , Byun, Jinyoung in Accuracy , Binding energy , Binding sites

2021

In this study, we investigated the binding affinities between the main protease of SARS-CoV-2 virus (Mpro) and its various ligands to identify the hot spot residues of the protease. To benchmark the influence of various force fields on hot spot residue identification and binding free energy calculation, we performed MD simulations followed by MM-PBSA analysis with three different force fields: CHARMM36, AMBER99SB, and GROMOS54a7. We performed MD simulations with 100 ns for 11 protein–ligand complexes. From the series of MD simulations and MM-PBSA calculations, it is identified that the MM-PBSA estimations using different force fields are weakly correlated to each other. From a comparison between the force fields, AMBER99SB and GROMOS54a7 results are fairly correlated while CHARMM36 results show weak or almost no correlations with the others. Our results suggest that MM-PBSA analysis results strongly depend on force fields and should be interpreted carefully. Additionally, we identified the hot spot residues of Mpro, which play critical roles in ligand binding through energy decomposition analysis. It is identified that the residues of the S4 subsite of the binding site, N142, M165, and R188, contribute strongly to ligand binding. In addition, the terminal residues, D295, R298, and Q299 are identified to have attractive interactions with ligands via electrostatic and solvation energy. We believe that our findings will help facilitate developing the novel inhibitors of SARS-CoV-2.

Journal Article

Share this book

Add to My Shelf

Improving docking and virtual screening performance using AlphaFold2 multi-state modeling for kinases

by Ha, Junsu , Ko, Junsu , Lee, Juyong in Accuracy , AlphaFold2 , Benchmarks

2024

Structure-based virtual screening (SBVS) is a crucial computational approach in drug discovery, but its performance is sensitive to structural variations. Kinases, which are major drug targets, exemplify this challenge due to active site conformational changes caused by different inhibitor types. Most experimentally determined kinase structures have the DFGin state, potentially biasing SBVS towards type I inhibitors and limiting the discovery of diverse scaffolds. We introduce a multi-state modeling (MSM) protocol for AlphaFold2 (AF2) kinase structures using state-specific templates to address these challenges. Our comprehensive benchmarks evaluate predicted model qualities, binding pose prediction accuracy, and hit compound identification through ensemble SBVS. Results demonstrate that MSM models exhibit comparable or improved structural accuracy compared to standard AF2 models, enhancing pose prediction accuracy and effectively capturing kinase-ligand interactions. In virtual screening experiments, our MSM approach consistently outperforms standard AF2 and AF3 modeling, particularly in identifying diverse hit compounds. This study highlights the potential of MSM in broadening kinase inhibitor discovery by facilitating the identification of chemically diverse inhibitors, offering a promising solution to the structural bias problem in kinase-targeted drug discovery.

Journal Article

Share this book

Add to My Shelf

Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

by Ucak, Umit V. , Ashyrmamatov, Islambek , Lee, Juyong in Analysis , Atom-in-SMILES , Chemical language processing

2023

Tokenization is an important preprocessing step in natural language processing that may have a significant influence on prediction quality. This research showed that the traditional SMILES tokenization has a certain limitation that results in tokens failing to reflect the true nature of molecules. To address this issue, we developed the atom-in-SMILES tokenization scheme that eliminates ambiguities in the generic nature of SMILES tokens. Our results in multiple chemical translation and molecular property prediction tasks demonstrate that proper tokenization has a significant impact on prediction quality. In terms of prediction accuracy and token degeneration, atom-in-SMILES is more effective method in generating higher-quality SMILES sequences from AI-based chemical models compared to other tokenization and representation schemes. We investigated the degrees of token degeneration of various schemes and analyzed their adverse effects on prediction quality. Additionally, token-level repetitions were quantified, and generated examples were incorporated for qualitative examination. We believe that the atom-in-SMILES tokenization has a great potential to be adopted by broad related scientific communities, as it provides chemically accurate, tailor-made tokens for molecular property prediction, chemical translation, and molecular generative models.

Journal Article

Share this book

Add to My Shelf

Reconstruction of lossless molecular representations from fingerprints

by Ucak, Umit V. , Ashyrmamatov, Islambek , Lee, Juyong in Analysis , Chemistry , Chemistry and Materials Science

2023

The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter