Catalogue Search | MBRL

Recent Progress of Protein Tertiary Structure Prediction

by Yihan Chen , Yifeng Shen , Jianzhao Gao in Accuracy , Algorithms , AlphaFold2

2024

The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.

Journal Article

Share this book

Add to My Shelf

Protein Structure Prediction: Conventional and Deep Learning Perspectives

by Jisna, V A , Jayaraj, P B in Amino acid sequence , Antibodies , Coevolution

2021

Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein–protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.

Journal Article

Share this book

Add to My Shelf

A Uniquely Stable Trimeric Model of SARS-CoV-2 Spike Transmembrane Domain

by Aliper, Elena T. , Polyansky, Anton A. , Efremov, Roman G. in Membranes , Molecular Dynamics Simulation , Peptides

2022

Understanding fusion mechanisms employed by SARS-CoV-2 spike protein entails realistic transmembrane domain (TMD) models, while no reliable approaches towards predicting the 3D structure of transmembrane (TM) trimers exist. Here, we propose a comprehensive computational framework to model the spike TMD only based on its primary structure. We performed amino acid sequence pattern matching and compared the molecular hydrophobicity potential (MHP) distribution on the helix surface against TM homotrimers with known 3D structures and selected an appropriate template for homology modeling. We then iteratively built a model of spike TMD, adjusting “dynamic MHP portraits” and residue variability motifs. The stability of this model, with and without palmitoyl modifications downstream of the TMD, and several alternative configurations (including a recent NMR structure), was tested in all-atom molecular dynamics simulations in a POPC bilayer mimicking the viral envelope. Our model demonstrated unique stability under the conditions applied and conforms to known basic principles of TM helix packing. The original computational framework looks promising and could potentially be employed in the construction of 3D models of TM trimers for a wide range of membrane proteins.

Journal Article

Share this book

Add to My Shelf

CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields

by Joo, Keehyoung , Sim, Sangjin , Lee, Juyong in Algorithms , Amino Acid Sequence , boosted regression trees

2022

Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.

Journal Article

Share this book

Add to My Shelf

TRAVeLer: a tool for template-based RNA secondary structure visualization

by Elias, Richard , Hoksza, David in Algorithms , Bioinformatics , Biomedical and Life Sciences

2017

Background Visualization of RNA secondary structures is a complex task, and, especially in the case of large RNA structures where the expected layout is largely habitual, the existing visualization tools often fail to produce suitable visualizations. This led us to the idea to use existing layouts as templates for the visualization of new RNAs similarly to how templates are used in homology-based structure prediction. Results This article introduces Traveler, a software tool enabling visualization of a target RNA secondary structure using an existing layout of a sufficiently similar RNA structure as a template. Traveler is based on an algorithm which converts the target and template structures into corresponding tree representations and utilizes tree edit distance coupled with layout modification operations to transform the template layout into the target one. Traveler thus accepts a pair of secondary structures and a template layout and outputs a layout for the target structure. Conclusions Traveler is a command-line open source tool able to quickly generate layouts for even the largest RNA structures in the presence of a sufficiently similar layout. It is available at http://github.com/davidhoksza/traveler .

Journal Article

Share this book

Add to My Shelf

FALCON2: a web server for high-quality prediction of protein tertiary structures

by Sun, Shiwei , Bu, Dongbo , Kong, Lupeng in Ab initio prediction , Algorithms , Amino Acid Sequence

2021

Background Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. Results In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. Conclusions By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.

Journal Article

Share this book

Add to My Shelf

HOMCOS: an updated server to search and model complex 3D structures

by Kawabata, Takeshi in Amino Acid Sequence , Biochemistry , Bioinformatics

2016

The HOMCOS server ( http://homcos.pdbj.org ) was updated for both searching and modeling the 3D complexes for all molecules in the PDB. As compared to the previous HOMCOS server, the current server targets all of the molecules in the PDB including proteins, nucleic acids, small compounds and metal ions. Their binding relationships are stored in the database. Five services are available for users. For the services “Modeling a Homo Protein Multimer” and “Modeling a Hetero Protein Multimer”, a user can input one or two proteins as the queries, while for the service “Protein-Compound Complex”, a user can input one chemical compound and one protein. The server searches similar molecules by BLAST and KCOMBU. Based on each similar complex found, a simple sequence-replaced model is quickly generated by replacing the residue names and numbers with those of the query protein. A target compound is flexibly superimposed onto the template compound using the program fkcombu . If monomeric 3D structures are input as the query, then template-based docking can be performed. For the service “Searching Contact Molecules for a Query Protein”, a user inputs one protein sequence as the query, and then the server searches for its homologous proteins in PDB and summarizes their contacting molecules as the predicted contacting molecules. The results are summarized in “Summary Bars” or “Site Table”display. The latter shows the results as a one-site-one-row table, which is useful for annotating the effects of mutations. The service “Searching Contact Molecules for a Query Compound” is also available.

Journal Article

Share this book

Add to My Shelf

A Peptides Prediction Methodology for Tertiary Structure Based on Simulated Annealing

by Soto-Monterrubio, Diego A. , Frausto-Solís, Juan , Castilla-Valdez, Guadalupe in Algorithms , Amino acids , Energy

2021

The Protein Folding Problem (PFP) is a big challenge that has remained unsolved for more than fifty years. This problem consists of obtaining the tertiary structure or Native Structure (NS) of a protein knowing its amino acid sequence. The computational methodologies applied to this problem are classified into two groups, known as Template-Based Modeling (TBM) and ab initio models. In the latter methodology, only information from the primary structure of the target protein is used. In the literature, Hybrid Simulated Annealing (HSA) algorithms are among the best ab initio algorithms for PFP; Golden Ratio Simulated Annealing (GRSA) is a PFP family of these algorithms designed for peptides. Moreover, for the algorithms designed with TBM, they use information from a target protein’s primary structure and information from similar or analog proteins. This paper presents GRSA-SSP methodology that implements a secondary structure prediction to build an initial model and refine it with HSA algorithms. Additionally, we compare the performance of the GRSAX-SSP algorithms versus its corresponding GRSAX. Finally, our best algorithm GRSAX-SSP is compared with PEP-FOLD3, I-TASSER, QUARK, and Rosetta, showing that it competes in small peptides except when predicting the largest peptides.

Journal Article

Share this book

Add to My Shelf

Template-Based Modelling of the Structure of Fungal Effector Proteins

by Hane, James K. , Mancera, Ricardo L. , Jones, Darcy A. B. in Amino Acid Sequence , animals , Biochemistry

2024

The discovery of new fungal effector proteins is necessary to enable the screening of cultivars for disease resistance. Sequence-based bioinformatics methods have been used for this purpose, but only a limited number of functional effector proteins have been successfully predicted and subsequently validated experimentally. A significant obstacle is that many fungal effector proteins discovered so far lack sequence similarity or conserved sequence motifs. The availability of experimentally determined three-dimensional (3D) structures of a number of effector proteins has recently highlighted structural similarities amongst groups of sequence-dissimilar fungal effectors, enabling the search for similar structural folds amongst effector sequence candidates. We have applied template-based modelling to predict the 3D structures of candidate effector sequences obtained from bioinformatics predictions and the PHI-BASE database. Structural matches were found not only with ToxA- and MAX-like effector candidates but also with non-fungal effector-like proteins—including plant defensins and animal venoms—suggesting the broad conservation of ancestral structural folds amongst cytotoxic peptides from a diverse range of distant species. Accurate modelling of fungal effectors were achieved using RaptorX. The utility of predicted structures of effector proteins lies in the prediction of their interactions with plant receptors through molecular docking, which will improve the understanding of effector–plant interactions.

Journal Article

Share this book

Add to My Shelf

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11

by Cao, Renzhi , Cheng, Jianlin , Li, Jilong in Algorithms , Amino Acid Sequence , Analysis

2015

Background With more and more protein sequences produced in the genomic era, predicting protein structures from sequences becomes very important for elucidating the molecular details and functions of these proteins for biomedical research. Traditional template-based protein structure prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models. Methods We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of protein structure prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target protein. In the second step, it used a large number of protein model quality assessment methods to evaluate and rank the models in the protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking. Results The method was implemented as two protein structure prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors. Conclusions The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/ .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter