Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
129 result(s) for "Bateman, Alex"
Sort by:
Highly accurate protein structure prediction for the human proteome
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective. AlphaFold is used to predict the structures of almost all of the proteins in the human proteome—the availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.
Ten simple rules to make your computing more environmentally sustainable
[...]trees are essential to eliminate excess CO2; on average, a mature tree can sequester 11,000 gCO2 per year [12]. Since it depends on the energy needed to power the computer and the carbon footprint of producing such energy, it can be calculated fairly accurately. [...]the end-to-end environmental impact of computers and data centres is substantial but difficult to quantify. [...]try to use your gear for as long as is reasonable.
Sequence analysis of tyrosine recombinases allows annotation of mobile genetic elements in prokaryotic genomes
Mobile genetic elements (MGEs) sequester and mobilize antibiotic resistance genes across bacterial genomes. Efficient and reliable identification of such elements is necessary to follow resistance spreading. However, automated tools for MGE identification are missing. Tyrosine recombinase (YR) proteins drive MGE mobilization and could provide markers for MGE detection, but they constitute a diverse family also involved in housekeeping functions. Here, we conducted a comprehensive survey of YRs from bacterial, archaeal, and phage genomes and developed a sequence‐based classification system that dissects the characteristics of MGE‐borne YRs. We revealed that MGE‐related YRs evolved from non‐mobile YRs by acquisition of a regulatory arm‐binding domain that is essential for their mobility function. Based on these results, we further identified numerous unknown MGEs. This work provides a resource for comparative analysis and functional annotation of YRs and aids the development of computational tools for MGE annotation. Additionally, we reveal how YRs adapted to drive gene transfer across species and provide a tool to better characterize antibiotic resistance dissemination. SYNOPSIS A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes. Phylogenetic analysis of the tyrosine recombinase protein family classifies its members into twenty subgroups. Members of the subgroups have a specific function, sequence features and host taxonomy. Tyrosine recombinases of mobile genetic elements carry an additional arm‐binding domain. Tyrosine recombinase classification enables the identification of new mobile genetic elements in bacterial genomes. Graphical Abstract A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes.
Uncovering new families and folds in the natural protein universe
We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database 1 . These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this ‘dark matter’ of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database 2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin–antitoxin systems, TumE–TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology. The extent to which the AlphaFold database has structurally illuminated proteins that are challenging to annotate for function or putative biological role using standard homology-based approaches at high predicted accuracy is investigated.
Discovery of fibrillar adhesins across bacterial species
Background Fibrillar adhesins are long multidomain proteins that form filamentous structures at the cell surface of bacteria. They are an important yet understudied class of proteins composed of adhesive and stalk domains that mediate interactions of bacteria with their environment. This study aims to characterize fibrillar adhesins in a wide range of bacterial phyla and to identify new fibrillar adhesin-like proteins to improve our understanding of host-bacteria interactions. Results Through careful literature and computational searches, we identified 82 stalk and 27 adhesive domain families in fibrillar adhesins. Based on the presence of these domains in the UniProt Reference Proteomes database, we identified and analysed 3,542 fibrillar adhesin-like proteins across species of the most common bacterial phyla. We further enumerate the adhesive and stalk domain combinations found in nature and demonstrate that fibrillar adhesins have complex and variable domain architectures, which differ across species. By analysing the domain architecture of fibrillar adhesins, we show that in Gram positive bacteria, adhesive domains are mostly positioned at the N-terminus and cell surface anchors at the C-terminus of the protein, while their positions are more variable in Gram negative bacteria. We provide an open repository of fibrillar adhesin-like proteins and domains to enable further studies of this class of bacterial surface proteins. Conclusion This study provides a domain-based characterization of fibrillar adhesins and demonstrates that they are widely found in species across the main bacterial phyla. We have discovered numerous novel fibrillar adhesins and improved our understanding of pathogenic adhesion and invasion mechanisms.
Cryo-EM structures of human RNA polymerase III in its unbound and transcribing states
RNA polymerase III (Pol III) synthesizes transfer RNAs and other short, essential RNAs. Human Pol III misregulation is linked to tumor transformation, neurodegenerative and developmental disorders, and increased sensitivity to viral infections. Here, we present cryo-electron microscopy structures at 2.8 to 3.3 Å resolution of transcribing and unbound human Pol III. We observe insertion of the TFIIS-like subunit RPC10 into the polymerase funnel, providing insights into how RPC10 triggers transcription termination. Our structures resolve elements absent from Saccharomyces cerevisiae Pol III such as the winged-helix domains of RPC5 and an iron–sulfur cluster, which tethers the heterotrimer subcomplex to the core. The cancer-associated RPC7α isoform binds the polymerase clamp, potentially interfering with Pol III inhibition by tumor suppressor MAF1, which may explain why overexpressed RPC7α enhances tumor transformation. Finally, the human Pol III structure allows mapping of disease-related mutations and may contribute to the development of inhibitors that selectively target Pol III for therapeutic interventions. Cryo-EM structures of human Pol III in both apo- and elongating states reveal metazoan-specific differences in the regulation of transcription termination and identify mutations relevant to human disease.
DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets
Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering of UniRef50 v. 2017_07, a protein sequence database including approximately 23M sequences. We performed a radical re-implementation of a pipeline we previously developed in order to allow handling millions of sequences and data volumes of the order of 3 TeraBytes. The modified pipeline, which we call DPCfam, finds ∼ 45,000 protein clusters in UniRef50. Our automatic classification is in close correspondence to the ones of the Pfam and ECOD resources: in particular, about 81% of medium-large Pfam families and 72% of ECOD families can be mapped to clusters generated by DPCfam. In addition, our protocol finds more than 14,000 clusters constituted of protein regions with no Pfam annotation, which are therefore candidates for representing novel protein families. These results are made available to the scientific community through a dedicated repository.
Expanding the repertoire of human tandem repeat RNA-binding proteins
Protein regions consisting of arrays of tandem repeats are known to bind other molecular partners, including nucleic acid molecules. Although the interactions between repeat proteins and DNA are already widely explored, studies characterising tandem repeat RNA-binding proteins are lacking. We performed a large-scale analysis of human proteins devoted to expanding the knowledge about tandem repeat proteins experimentally reported as RNA-binding molecules. This work is timely because of the release of a full set of accurate structural models for the human proteome amenable to repeat detection using structural methods. The main goal of our analysis was to build a comprehensive set of human RNA-binding proteins that contain repeats at the sequence or structure level. Our results showed that the combination of sequence and structural methods finds significantly more tandem repeat proteins than either method alone. We identified 219 tandem repeat proteins that bind RNA molecules and characterised the overlap between repeat regions and RNA-binding regions as a first step towards assessing their functional relationship. We observed differences in the characteristics of repeat regions predicted by sequence-based or structure-based methods in terms of their sequence composition, their functions and their protein domains.
Sparcle: assigning transcripts to cells in multiplexed images
Motivation Imaging-based spatial transcriptomics has the power to reveal patterns of single-cell gene expression by detecting mRNA transcripts as individually resolved spots in multiplexed images. However, molecular quantification has been severely limited by the computational challenges of segmenting poorly outlined, overlapping cells and of overcoming technical noise; the majority of transcripts are routinely discarded because they fall outside the segmentation boundaries. This lost information leads to less accurate gene count matrices and weakens downstream analyses, such as cell type or gene program identification. Results Here, we present Sparcle, a probabilistic model that reassigns transcripts to cells based on gene covariation patterns and incorporates spatial features such as distance to nucleus. We demonstrate its utility on both multiplexed error-robust fluorescence in situ hybridization, single-molecule FISH data, probabilistic cell typing in situ sequencing, spatially resolved transcript amplicon readout mapping and MERFISH from Vizgen. Sparcle improves transcript assignment, providing more realistic per-cell quantification of each gene, better delineation of cell boundaries and improved cluster assignments. Critically, our approach does not require an accurate segmentation and is agnostic to technological platform. Availability and implementation The code is available at: https://github.com/sandhya212/Sparcle_for_spot_reassignments Contact sandhya.prabhakaran@moffitt.org Supplementary information Supplementary data are available at Bioinformatics Advances online.
Periscope Proteins are variable-length regulators of bacterial cell surface interactions
Changes at the cell surface enable bacteria to survive in dynamic environments, such as diverse niches of the human host. Here, we reveal “Periscope Proteins” as a widespread mechanism of bacterial surface alteration mediated through protein length variation. Tandem arrays of highly similar folded domains can form an elongated rod-like structure; thus, variation in the number of domains determines how far an N-terminal host ligand binding domain projects from the cell surface. Supported by newly available long-read genome sequencing data, we propose that this class could contain over 50 distinct proteins, including those implicated in host colonization and biofilm formation by human pathogens. In large multidomain proteins, sequence divergence between adjacent domains appears to reduce interdomain misfolding. Periscope Proteins break this “rule,” suggesting that their length variability plays an important role in regulating bacterial interactions with host surfaces, other bacteria, and the immune system.