Catalogue Search | MBRL

DockRMSD: an open-source tool for atom mapping and RMSD calculation of symmetric molecules through graph isomorphism

by Bell, Eric W. , Zhang, Yang in Algorithms , Analysis , Caffeine

2019

Comparison of ligand poses generated by protein–ligand docking programs has often been carried out with the assumption of direct atomic correspondence between ligand structures. However, this correspondence is not necessarily chemically relevant for symmetric molecules and can lead to an artificial inflation of ligand pose distance metrics, particularly those that depend on receptor superposition (rather than ligand superposition), such as docking root mean square deviation (RMSD). Several of the commonly-used RMSD calculation algorithms that correct for molecular symmetry do not take into account the bonding structure of molecules and can therefore result in non-physical atomic mapping. Here, we present DockRMSD, a docking pose distance calculator that converts the symmetry correction to a graph isomorphism searching problem, in which the optimal atomic mapping and RMSD calculation are performed by an exhaustive and fast matching search of all isomorphisms of the ligand structure graph. We show through evaluation of docking poses generated by AutoDock Vina on the CSAR Hi-Q set that DockRMSD is capable of deterministically identifying the minimum symmetry-corrected RMSD and is able to do so without significant loss of computational efficiency compared to other methods. The open-source DockRMSD program can be conveniently integrated with various docking pipelines to assist with accurate atomic mapping and RMSD calculations, which can therefore help improve docking performance, especially for ligand molecules with complicated structural symmetry.

Journal Article

Share this book

Add to My Shelf

I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction

by Zhang, Yang , Zhou, Xiaogen , Zheng, Wei in Amino acid sequence , Amino acids , Annotations

2022

Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.A protocol is described for predicting the structures and functions of multi-domain proteins using the freely available deep-learning-based web platform I-TASSER-MTD.

Journal Article

Share this book

Add to My Shelf

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

by Bell, Eric W. , Zhang, Yang , Zhou, Xiaogen in Accuracy , Biology and Life Sciences , Coevolution

2021

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top- L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top- L /5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.

Journal Article

Share this book

Add to My Shelf

EDock: blind protein–ligand docking by replica-exchange monte carlo simulation

by Bell, Eric W. , Zhang, Wenyi , Zhang, Yang in Annotations , Binding sites , Blind docking

2020

Protein–ligand docking is an important approach for virtual screening and protein function annotation. Although many docking methods have been developed, most require a high-resolution crystal structure of the receptor and a user-specified binding site to start. This information is, however, not available for the majority of unknown proteins, including many pharmaceutically important targets. Developing blind docking methods without predefined binding sites and working with low-resolution receptor models from protein structure prediction is thus essential. In this manuscript, we propose a novel Monte Carlo based method, EDock, for blind protein–ligand docking. For a given protein, binding sites are first predicted by sequence-profile and substructure-based comparison searches with initial ligand poses generated by graph matching. Next, replica-exchange Monte Carlo (REMC) simulations are performed for ligand conformation refinement under the guidance of a physical force field coupled with binding-site distance constraints. The method was tested on two large-scale datasets containing 535 protein–ligand pairs. Without specifying binding pockets on the experimental receptor structures, EDock achieves on average a ligand RMSD of 2.03 Å, which compares favorably with state-of-the-art docking methods including DOCK6 (2.68 Å) and AutoDock Vina (3.92 Å). When starting with predicted models from I-TASSER, EDock still generates reasonable docking models, with a success rate 159% and 67% higher than DOCK6 and AutoDock Vina, respectively. Detailed data analyses show that the major advantage of EDock lies in reliable ligand binding site predictions and extensive REMC sampling, which allows for the implementation of multiple van der Waals weightings to accommodate different levels of steric clashes and cavity distortions and therefore enhances the robustness of low-resolution docking with predicted protein structures.

Journal Article

Share this book

Add to My Shelf

Recombinant Penicillium oxalicum 16 β-Glucosidase 1 Displays Comprehensive Inhibitory Resistance to Several Lignocellulose Pretreatment Products, Ethanol, and Salt

by Zhao, Xihua , Li, Hanxin , Shi, Yi in Acids , Biofuels , Cellobiase

2020

β-Glucosidase (BGL) is a rate-limiting enzyme of lignocellulose hydrolysis for second-generation bioethanol production, but its inhibition by lignocellulose pretreatment products, ethanol, and salt is apparent. Here, the recombinant Penicillium oxalicum 16 BGL 1 (rPO16BGL1) from Pichia pastoris GS115 kept complete activity at 0.2–1.4 mg/mL furan derivatives and phenolic compounds, 50 mg/mL sodium chloride (potassium chloride), or 100 mg/mL ethanol at 40 °C. rPO16BGL1 retained above 50% residual activity at 30 mg/mL organic acid sodium, and 60% residual activity at 40 °C with 300 mg/mL ethanol. Sodium chloride and potassium chloride had a complicated effect on rPO16BGL1, which resulted in activation or inhibition. The inhibition kinetics of the enzyme reaction demonstrated that organic acids and organic acid sodium were non-competitive inhibitors and that ethanol was a competitive inhibitor at < 1.5 mg/mL salicin. Moreover, substrate inhibition of the enzyme was found at > 2 mg/mL salicin, and the Km/KI and Km/KSI average values revealed that the inhibitory strength was ranked as salicin-organic acids > organic acids > salicin-organic acid sodium salt > organic acid sodium salt > salicin > salicin-KCl > salicin-NaCl > salicin-ethanol > ethanol.

Journal Article

Share this book

Add to My Shelf

Classification models distinguish functional and trafficking effects of KCNQ1 variants to enhance variant interpretation

by George, Jr, Alfred L , Ledwitch, Kaitlyn V , Desai, Reshma R in Biophysics

2025

Missense mutations compromise protein fitness by altering stability and function, which can lead to various clinical disease states. The potassium ion channel KCNQ1 underlies the majority of congenital long QT syndrome (LQTS) cases, one of the most common genetic arrhythmia syndromes. During genetic testing for LQTS, variants of uncertain significance (VUS) confound diagnosis and clinical management. KCNQ1 protein fitness metrics enable mechanistic classification of variants, directly informing the molecular basis for dysfunction and providing clinical interpretation of variants linked to LQTS and other channelopathies. We developed structure-aware random forest classifier models to predict seven metrics of KCNQ1 fitness, four functional electrophysiology measurements (peak current density, voltage-dependence, gating kinetics), and three trafficking values measured by flow cytometry. Our trained models outperformed AlphaMissense in predicting protein fitness, enhancing interpretation of ClinVar VUS and variants classified as ambiguous by AlphaMissense. We demonstrate the classifiers distinguish benign and pathogenic variants from ClinVar and gnomAD and identify systematic patterns of dysfunction and mistrafficking along the functionally critical S4 helix. Our method advances variant effect prediction with a mechanistic classifier that reliably links missense mutations in KCNQ1 to their specific disease-causing mechanisms. As a resource for precision medicine approaches for LQTS or other KCNQ1 channelopathies, we provide the predictions and scores for all KCNQ1 missense variants across the structured region of the protein.

Journal Article

Share this book

Add to My Shelf

Protein structure and sequence re-analysis of 2019-nCoV genome does not indicate snakes as its intermediate host or the unique similarity between its spike protein insertions and HIV-1

by Zhang, Yang , Zhou, Xiaogen , Zhang, Chengxin in Amino acid sequence , Bioinformatics , Computer applications

2020

As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, careful analysis of its transmission and cellular mechanisms is sorely needed. In this report, we re-analyzed the computational approaches and findings presented in two recent manuscripts by Ji . (https://doi.org/10.1002/jmv.25682) and by Pradhan . (https://doi.org/10.1101/2020.01.30.927871), which concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions shared a unique similarity to HIV-1. Results from our re-implementation of the analyses, built on larger-scale datasets using state-of-the-art bioinformatics methods and databases, do not support the conclusions proposed by these manuscripts. Based on our analyses and existing data of coronaviruses, we concluded that the intermediate hosts of 2019-nCoV are more likely to be mammals and birds than snakes, and that the \"novel insertions\" observed in the spike protein are naturally evolved from bat coronaviruses.

Journal Article

Share this book

Add to My Shelf

PEPPI: Whole-proteome protein-protein interaction prediction through structure and sequence similarity, functional association, and machine learning

by Schwartz, Jacob H , Zhang, Yang , Freddolino, Peter L in Bayesian analysis , Complementarity , Computer applications

2021

Proteome-wide identification of protein-protein interactions is a formidable task which has yet to be sufficiently addressed by experimental methodologies. Many computational methods have been developed to predict proteome-wide interaction networks, but few leverage both the sensitivity of structural information and the wide availability of sequence data. We present PEPPI, a pipeline which integrates structural similarity, sequence similarity, functional association data, and machine learning-based classification through a naive Bayesian classifier model to accurately predict protein-protein interactions at a proteomic scale. Through benchmarking against a set of 798 ground truth interactions and an equal number of non-interactions, we have found that PEPPI attains 4.5% higher AUROC than the best of other state-of-the-art methods. As a proteomic-scale application, PEPPI was applied to model the interactions which occur between SARS-CoV-2 and human host cells during coronavirus infection, where 403 high-confidence interactions were identified with predictions covering 73% of a gold standard dataset from PSICQUIC and demonstrating significant complementarity with the most recent high-throughput experiments. PEPPI is available both as a webserver and in a standalone version and should be a powerful and generally applicable tool for computational screening of protein-protein interactions. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://zhanggroup.org/PEPPI/

Paper

Share this book

Add to My Shelf

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

by Zhang, Yang , Zhou, Xiaogen , Zheng, Wei in Accuracy , Bioinformatics , Computer applications

2020

Abstract The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP and CAMEO experiments, and outperformed other state-of-the-art methods by at least 58.4% for the CASP 11&12 and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Availability The training and testing data, standalone package, and the online server for TripletRes are available at https://zhanglab.ccmb.med.umich.edu/TripletRes/. Author Summary Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate significant advancement of the proposed methods over other approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training. Competing Interest Statement The authors have declared no competing interest. Footnotes * ↵† These authors should be regarded as Joint First Authors * https://zhanglab.ccmb.med.umich.edu/TripletRes/

Paper

Share this book

Add to My Shelf

FakeRotLib: expedient non-canonical amino acid parameterization in Rosetta

by Brown, Benjamin P , Meiler, Jens , Bell, Eric W in Amino acids , Bioinformatics , Deep learning

2025

Non canonical amino acids (NCAAs) occupy an important place, both in natural biology and synthetic applications. However, modeling these amino acids still lies outside the capabilities of most deep learning methods due to sparse training datasets for this task. Instead, biophysical methods such as Rosetta can excel in modeling NCAAs. We discuss the various aspects of parameterizing a NCAA for use in Rosetta, identifying rotamer distribution modeling as one of the most impactful factors of NCAA parameterization on Rosetta performance. To this end, we also present FakeRotLib, a method which uses statistical fitting of small molecule conformer to create rotamer distributions. We find that FakeRotLib outperforms existing methods in a fraction of the time and is able to parameterize NCAA types previously unmodeled by Rosetta.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter