Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
391
result(s) for
"631/114/663"
Sort by:
SignalP 6.0 predicts all five types of signal peptides using protein language models
by
Johansen, Alexander Rosenberg
,
von Heijne, Gunnar
,
Almagro Armenteros, José Juan
in
631/114/2184
,
631/114/663/2009
,
631/337/474
2022
Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.
A new version of SignalP predicts all types of signal peptides.
Journal Article
AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination
by
Richardson, Jane S.
,
Read, Randy J.
,
Williams, Christopher J.
in
631/114/2411
,
631/114/663
,
631/1647/2258/1266
2024
Artificial intelligence-based protein structure prediction methods such as AlphaFold have revolutionized structural biology. The accuracies of these predictions vary, however, and they do not take into account ligands, covalent modifications or other environmental factors. Here, we evaluate how well AlphaFold predictions can be expected to describe the structure of a protein by comparing predictions directly with experimental crystallographic maps. In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses. We further suggest that it is important to consider the confidence in prediction when interpreting AlphaFold predictions and to carry out experimental structure determination to verify structural details, particularly those that involve interactions not included in the prediction.
An analysis of AlphaFold protein structure predictions shows that while in many cases the predictions are highly accurate, there are also many instances where the predicted structures or parts of predicted structures do not agree with experimentally resolved data. Therefore, care must be taken when using these predictions for informing structural hypotheses.
Journal Article
Clustering predicted structures at the scale of the known protein universe
2023
Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy
1
, and over 214 million predicted structures are available in the AlphaFold database
2
. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.
The novel Foldseek clustering algorithm defines 2.30 million clusters of AlphaFold structures, identifying remote structural similarity of human immune-related proteins in prokaryotic species.
Journal Article
Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure
by
Livesey, Benjamin J.
,
Gerasimavicius, Lukas
,
Marsh, Joseph A.
in
631/114/663
,
631/208/737
,
Computer applications
2022
Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Taking protein structure into account has therefore provided great insight into the molecular mechanisms underlying human genetic disease. While there has been much focus on how mutations can disrupt protein structure and thus cause a loss of function (LOF), alternative mechanisms, specifically dominant-negative (DN) and gain-of-function (GOF) effects, are less understood. Here, we investigate the protein-level effects of pathogenic missense mutations associated with different molecular mechanisms. We observe striking differences between recessive
vs
dominant, and LOF
vs
non-LOF mutations, with dominant, non-LOF disease mutations having much milder effects on protein structure, and DN mutations being highly enriched at protein interfaces. We also find that nearly all computational variant effect predictors, even those based solely on sequence conservation, underperform on non-LOF mutations. However, we do show that non-LOF mutations could potentially be identified by their tendency to cluster in three-dimensional space. Overall, our work suggests that many pathogenic mutations that act via DN and GOF mechanisms are likely being missed by current variant prioritisation strategies, but that there is considerable scope to improve computational predictions through consideration of molecular disease mechanisms.
Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Here the authors analyse the locations of thousands of human disease mutations and their predicted effects on protein structure and show that,while loss-of-function mutations tend to be highly disruptive, non-loss-of-function mutations are in general much milder at a protein structural level.
Journal Article
US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes
2022
Structure comparison and alignment are of fundamental importance in structural biology studies. We developed the first universal platform, US-align, to uniformly align monomer and complex structures of different macromolecules—proteins, RNAs and DNAs. The pipeline is built on a uniform TM-score objective function coupled with a heuristic alignment searching algorithm. Large-scale benchmarks demonstrated consistent advantages of US-align over state-of-the-art methods in pairwise and multiple structure alignments of different molecules. Detailed analyses showed that the main advantage of US-align lies in the extensive optimization of the unified objective function powered by efficient heuristic search iterations, which substantially improve the accuracy and speed of the structural alignment process. Meanwhile, the universal protocol fusing different molecular and structural types helps facilitate the heterogeneous oligomer structure comparison and template-based protein–protein and protein–RNA/DNA docking.
US-align is a universal protocol for monomeric and oligomeric structural alignments of protein, RNA and DNA molecules, built on the coupling of a uniform TM-score objective function and the heuristic iterative searching algorithm.
Journal Article
flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions
2021
Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn’s webserver is available at
http://biomine.cs.vcu.edu/servers/flDPnn/
The authors present flDPnn, a computational tool for disorder and disorder function predictions from protein sequences. flDPnn was assessed with the data from the “Critical Assessment of Protein Intrinsic Disorder Prediction” experiment and on an independent and low-similarity test dataset, which show that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions.
Journal Article
SARS-CoV-2 variants, spike mutations and immune escape
2021
Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of ‘variants of concern’, that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.The evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been characterized by the emergence of mutations and so-called variants of concern that impact virus characteristics, including transmissibility and antigenicity. In this Review, members of the COVID-19 Genomics UK (COG-UK) Consortium and colleagues summarize mutations of the SARS-CoV-2 spike protein, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
Journal Article
Clustering huge protein sequence sets in linear time
2018
Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale functional annotation and structure prediction. Utilizing this enormous resource would require reducing its redundancy by similarity clustering. However, clustering hundreds of millions of sequences is impractical using current algorithms because their runtimes scale as the input set size
N
times the number of clusters
K
, which is typically of similar order as
N
, resulting in runtimes that increase almost quadratically with
N
. We developed Linclust, the first clustering algorithm whose runtime scales as
N
, independent of
K
. It can also cluster datasets several times larger than the available main memory. We cluster 1.6 billion metagenomic sequence fragments in 10 h on a single server to 50% sequence identity, >1000 times faster than has been possible before. Linclust will help to unlock the great wealth contained in metagenomic and genomic sequence databases.
Billions of metagenomic and genomic sequences fill up public datasets, which makes similarity clustering an important and time-critical analysis step. Here, the authors develop Linclust, an algorithm with linear time complexity that can cluster over a billion sequences within hours on a single server.
Journal Article
Inferring the molecular and phenotypic impact of amino acid variants with MutPred2
2020
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
Identifying variants capable of causing genetic disease is challenging. The authors use semisupervised learning to predict pathogenic missense variants and their impacts on protein structure and function, enabling a molecular mechanism-driven approach to studying different types of human disease.
Journal Article
Machine learning coarse-grained potentials of protein thermodynamics
2023
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
Understanding protein dynamics is a complex scientific challenge. Here, authors construct coarse-grained molecular potentials using artificial neural networks, significantly accelerating protein dynamics simulations while preserving their thermodynamics.
Journal Article