Catalogue Search | MBRL

Highly accurate protein structure prediction for the human proteome

by Nikolov, Stanislav , Senior, Andrew W. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/1647/2067

2021

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective. AlphaFold is used to predict the structures of almost all of the proteins in the human proteome—the availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.

Journal Article

Share this book

Add to My Shelf

Ten simple rules to make your computing more environmentally sustainable

by Bateman, Alex , Lannelongue, Loïc , Inouye, Michael in Carbon dioxide , Carbon Dioxide - analysis , Carbon footprint

2021

[...]trees are essential to eliminate excess CO2; on average, a mature tree can sequester 11,000 gCO2 per year [12]. Since it depends on the energy needed to power the computer and the carbon footprint of producing such energy, it can be calculated fairly accurately. [...]the end-to-end environmental impact of computers and data centres is substantial but difficult to quantify. [...]try to use your gear for as long as is reasonable.

Journal Article

Share this book

Add to My Shelf

Sequence analysis of tyrosine recombinases allows annotation of mobile genetic elements in prokaryotic genomes

by Bateman, Alex , Smyshlyaev, Georgy , Barabas, Orsolya in Annotations , Antibiotic resistance , Antibiotics

2021

Mobile genetic elements (MGEs) sequester and mobilize antibiotic resistance genes across bacterial genomes. Efficient and reliable identification of such elements is necessary to follow resistance spreading. However, automated tools for MGE identification are missing. Tyrosine recombinase (YR) proteins drive MGE mobilization and could provide markers for MGE detection, but they constitute a diverse family also involved in housekeeping functions. Here, we conducted a comprehensive survey of YRs from bacterial, archaeal, and phage genomes and developed a sequence‐based classification system that dissects the characteristics of MGE‐borne YRs. We revealed that MGE‐related YRs evolved from non‐mobile YRs by acquisition of a regulatory arm‐binding domain that is essential for their mobility function. Based on these results, we further identified numerous unknown MGEs. This work provides a resource for comparative analysis and functional annotation of YRs and aids the development of computational tools for MGE annotation. Additionally, we reveal how YRs adapted to drive gene transfer across species and provide a tool to better characterize antibiotic resistance dissemination. SYNOPSIS A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes. Phylogenetic analysis of the tyrosine recombinase protein family classifies its members into twenty subgroups. Members of the subgroups have a specific function, sequence features and host taxonomy. Tyrosine recombinases of mobile genetic elements carry an additional arm‐binding domain. Tyrosine recombinase classification enables the identification of new mobile genetic elements in bacterial genomes. Graphical Abstract A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes.

Journal Article

Share this book

Add to My Shelf

Uncovering new families and folds in the natural protein universe

by Tenson, Tanel , Schwede, Torsten , Mets, Toomas in 42/44 , 631/114/1305 , 631/114/2184

2023

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database 1 . These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database 2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology. The extent to which the AlphaFold database has structurally illuminated proteins that are challenging to annotate for function or putative biological role using standard homology-based approaches at high predicted accuracy is investigated.

Journal Article

Share this book

Add to My Shelf

Discovery of fibrillar adhesins across bacterial species

by Bateman, Alex , Lafita, Aleix , Monzon, Vivian in Adhesins , Adhesins, Bacterial - genetics , Adhesion

2021

Background Fibrillar adhesins are long multidomain proteins that form filamentous structures at the cell surface of bacteria. They are an important yet understudied class of proteins composed of adhesive and stalk domains that mediate interactions of bacteria with their environment. This study aims to characterize fibrillar adhesins in a wide range of bacterial phyla and to identify new fibrillar adhesin-like proteins to improve our understanding of host-bacteria interactions. Results Through careful literature and computational searches, we identified 82 stalk and 27 adhesive domain families in fibrillar adhesins. Based on the presence of these domains in the UniProt Reference Proteomes database, we identified and analysed 3,542 fibrillar adhesin-like proteins across species of the most common bacterial phyla. We further enumerate the adhesive and stalk domain combinations found in nature and demonstrate that fibrillar adhesins have complex and variable domain architectures, which differ across species. By analysing the domain architecture of fibrillar adhesins, we show that in Gram positive bacteria, adhesive domains are mostly positioned at the N-terminus and cell surface anchors at the C-terminus of the protein, while their positions are more variable in Gram negative bacteria. We provide an open repository of fibrillar adhesin-like proteins and domains to enable further studies of this class of bacterial surface proteins. Conclusion This study provides a domain-based characterization of fibrillar adhesins and demonstrates that they are widely found in species across the main bacterial phyla. We have discovered numerous novel fibrillar adhesins and improved our understanding of pathogenic adhesion and invasion mechanisms.

Journal Article

Share this book

Add to My Shelf

Cryo-EM structures of human RNA polymerase III in its unbound and transcribing states

by Bateman, Alex , Grötsch, Helga , Girbig, Mathias in 101/28 , 45/70 , 631/337/1645

2021

RNA polymerase III (Pol III) synthesizes transfer RNAs and other short, essential RNAs. Human Pol III misregulation is linked to tumor transformation, neurodegenerative and developmental disorders, and increased sensitivity to viral infections. Here, we present cryo-electron microscopy structures at 2.8 to 3.3 Å resolution of transcribing and unbound human Pol III. We observe insertion of the TFIIS-like subunit RPC10 into the polymerase funnel, providing insights into how RPC10 triggers transcription termination. Our structures resolve elements absent from Saccharomyces cerevisiae Pol III such as the winged-helix domains of RPC5 and an iron–sulfur cluster, which tethers the heterotrimer subcomplex to the core. The cancer-associated RPC7α isoform binds the polymerase clamp, potentially interfering with Pol III inhibition by tumor suppressor MAF1, which may explain why overexpressed RPC7α enhances tumor transformation. Finally, the human Pol III structure allows mapping of disease-related mutations and may contribute to the development of inhibitors that selectively target Pol III for therapeutic interventions. Cryo-EM structures of human Pol III in both apo- and elongating states reveal metazoan-specific differences in the regulation of transcription termination and identify mutations relevant to human disease.

Journal Article

Share this book

Add to My Shelf

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets

by Bateman, Alex , Russo, Elena Tea , Punta, Marco in Algorithms , Amino Acid Sequence , Annotations

2022

Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering of UniRef50 v. 2017_07, a protein sequence database including approximately 23M sequences. We performed a radical re-implementation of a pipeline we previously developed in order to allow handling millions of sequences and data volumes of the order of 3 TeraBytes. The modified pipeline, which we call DPCfam, finds ∼ 45,000 protein clusters in UniRef50. Our automatic classification is in close correspondence to the ones of the Pfam and ECOD resources: in particular, about 81% of medium-large Pfam families and 72% of ECOD families can be mapped to clusters generated by DPCfam. In addition, our protocol finds more than 14,000 clusters constituted of protein regions with no Pfam annotation, which are therefore candidates for representing novel protein families. These results are made available to the scientific community through a dedicated repository.

Journal Article

Share this book

Add to My Shelf

Expanding the repertoire of human tandem repeat RNA-binding proteins

by Bateman, Alex , Ormazábal, Agustín , Carletti, Matías Sebastián in Amino acids , Binding , Biology and Life Sciences

2023

Protein regions consisting of arrays of tandem repeats are known to bind other molecular partners, including nucleic acid molecules. Although the interactions between repeat proteins and DNA are already widely explored, studies characterising tandem repeat RNA-binding proteins are lacking. We performed a large-scale analysis of human proteins devoted to expanding the knowledge about tandem repeat proteins experimentally reported as RNA-binding molecules. This work is timely because of the release of a full set of accurate structural models for the human proteome amenable to repeat detection using structural methods. The main goal of our analysis was to build a comprehensive set of human RNA-binding proteins that contain repeats at the sequence or structure level. Our results showed that the combination of sequence and structural methods finds significantly more tandem repeat proteins than either method alone. We identified 219 tandem repeat proteins that bind RNA molecules and characterised the overlap between repeat regions and RNA-binding regions as a first step towards assessing their functional relationship. We observed differences in the characteristics of repeat regions predicted by sequence-based or structure-based methods in terms of their sequence composition, their functions and their protein domains.

Journal Article

Share this book

Add to My Shelf

Sparcle: assigning transcripts to cells in multiplexed images

by Prabhakaran, Sandhya in Application Note , Bioinformatics , Fluorescence in situ hybridization

2022

Motivation Imaging-based spatial transcriptomics has the power to reveal patterns of single-cell gene expression by detecting mRNA transcripts as individually resolved spots in multiplexed images. However, molecular quantification has been severely limited by the computational challenges of segmenting poorly outlined, overlapping cells and of overcoming technical noise; the majority of transcripts are routinely discarded because they fall outside the segmentation boundaries. This lost information leads to less accurate gene count matrices and weakens downstream analyses, such as cell type or gene program identification. Results Here, we present Sparcle, a probabilistic model that reassigns transcripts to cells based on gene covariation patterns and incorporates spatial features such as distance to nucleus. We demonstrate its utility on both multiplexed error-robust fluorescence in situ hybridization, single-molecule FISH data, probabilistic cell typing in situ sequencing, spatially resolved transcript amplicon readout mapping and MERFISH from Vizgen. Sparcle improves transcript assignment, providing more realistic per-cell quantification of each gene, better delineation of cell boundaries and improved cluster assignments. Critically, our approach does not require an accurate segmentation and is agnostic to technological platform. Availability and implementation The code is available at: https://github.com/sandhya212/Sparcle_for_spot_reassignments Contact sandhya.prabhakaran@moffitt.org Supplementary information Supplementary data are available at Bioinformatics Advances online.

Journal Article

Share this book

Add to My Shelf

Periscope Proteins are variable-length regulators of bacterial cell surface interactions

by Baumann, Christoph G. , Dégut, Clément , Jenkins, Huw T. in Bacteria , Bacterial Proteins - chemistry , Bacterial Proteins - genetics

2021

Changes at the cell surface enable bacteria to survive in dynamic environments, such as diverse niches of the human host. Here, we reveal “Periscope Proteins” as a widespread mechanism of bacterial surface alteration mediated through protein length variation. Tandem arrays of highly similar folded domains can form an elongated rod-like structure; thus, variation in the number of domains determines how far an N-terminal host ligand binding domain projects from the cell surface. Supported by newly available long-read genome sequencing data, we propose that this class could contain over 50 distinct proteins, including those implicated in host colonization and biofilm formation by human pathogens. In large multidomain proteins, sequence divergence between adjacent domains appears to reduce interdomain misfolding. Periscope Proteins break this “rule,” suggesting that their length variability plays an important role in regulating bacterial interactions with host surfaces, other bacteria, and the immune system.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter