Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations

by Petruschke, Hannes , Anders, John , Haange, Sven-Bastiaan , Jehmlich, Nico , von Bergen, Martin , Stadler, Peter F

in Algorithms / Amino acids / Annotations / Bioinformatics / Biomedical and Life Sciences / Candidates / Computational biology / Computational Biology/Bioinformatics / Computer Appl. in Life Sciences / Data mining / E coli / Gene mapping / Genomes / Genomics / Homology / Identification / Identification and classification / Identification methods / Intestinal microflora / Life Sciences / Metaproteogenomics / Methods / Microarrays / Microbial communitities / Microbiomes / Open reading frames / Peptide mapping / Peptide-spectrum matches / Peptides / Prokaryotes / Proteins / Proteomics / Small proteins / Species / Workflow

2021

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations

by Petruschke, Hannes , Anders, John , Haange, Sven-Bastiaan , Jehmlich, Nico , von Bergen, Martin , Stadler, Peter F

2021

Confirm

Do you wish to request the book?

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations

by Petruschke, Hannes , Anders, John , Haange, Sven-Bastiaan , Jehmlich, Nico , von Bergen, Martin , Stadler, Peter F

2021

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations

Petruschke, Hannes,

Anders, John,

Haange, Sven-Bastiaan,

Jehmlich, Nico,

von Bergen, Martin,

Stadler, Peter F

2021

Overview

Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information.

Share this book

Add to My Shelf

Publisher

BioMed Central,BioMed Central Ltd,Springer Nature B.V,BMC

Subject

/ Biomedical and Life Sciences

/ Candidates

/ Computational biology