Catalogue Search | MBRL

PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data

by Noble, William Stafford , Searle, Brian C , Bollinger, James G in 631/114/2400 , 631/114/2784 , 631/114/794

2017

A library-free, peptide-centric search tool, PECAN, robustly identifies peptides from data-independent acquisition mass-spectrometry-based proteomics data. Data-independent acquisition (DIA) is an emerging mass spectrometry (MS)-based technique for unbiased and reproducible measurement of protein mixtures. DIA tandem mass spectrometry spectra are often highly multiplexed, containing product ions from multiple cofragmenting precursors. Detecting peptides directly from DIA data is therefore challenging; most DIA data analyses require spectral libraries. Here we present PECAN ( http://pecan.maccosslab.org ), a library-free, peptide-centric tool that robustly and accurately detects peptides directly from DIA data. PECAN reports evidence of detection based on product ion scoring, which enables detection of low-abundance analytes with poor precursor ion signal. We demonstrate the chromatographic peak picking accuracy and peptide detection capability of PECAN, and we further validate its detection with data-dependent acquisition and targeted analyses. Lastly, we used PECAN to build a plasma proteome library from DIA data and to query known sequence variants.

Journal Article

Share this book

Add to My Shelf

Evaluating a large language model’s ability to solve programming exercises from an introductory bioinformatics course

by Piccolo, Stephen R. , Payne, Samuel H. , Ridge, Perry G. in Analysis , Artificial Intelligence , Bioinformatics

2023

Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists’ efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool—OpenAI’s ChatGPT—could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.

Journal Article

Share this book

Add to My Shelf

Detecting fabrication in large-scale molecular omics data

by Bradshaw, Michael S. , Payne, Samuel H. in Accuracy , Analysis , Big Data

2021

Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58–100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData .

Journal Article

Share this book

Add to My Shelf

Identification of Novel Protein Lysine Acetyltransferases in Escherichia coli

by Kuhn, Misty L. , Schilling, Birgit , Baumgartner, Jackson T. in Acetyl phosphate , Acetylation , acetyltransferase

2018

N ε-Lysine acetylation is one of the most abundant and important posttranslational modifications across all domains of life. One of the best-studied effects of acetylation occurs in eukaryotes, where acetylation of histone tails activates gene transcription. Although bacteria do not have true histones, N ε-lysine acetylation is prevalent; however, the role of these modifications is mostly unknown. We constructed an E. coli strain that lacked both known acetylation mechanisms to identify four new N ε-lysine acetyltransferases (RimI, YiaC, YjaB, and PhnO). We used mass spectrometry to determine the substrate specificity of these acetyltransferases. Structural analysis of selected substrate proteins revealed site-specific preferences for enzymatic acetylation that had little overlap with the preferences of the previously reported acetyl-phosphate nonenzymatic acetylation mechanism. Finally, YiaC and YfiQ appear to regulate flagellum-based motility, a phenotype critical for pathogenesis of many organisms. These acetyltransferases are highly conserved and reveal deeper and more complex roles for bacterial posttranslational modification. Posttranslational modifications, such as N ε-lysine acetylation, regulate protein function. N ε-lysine acetylation can occur either nonenzymatically or enzymatically. The nonenzymatic mechanism uses acetyl phosphate (AcP) or acetyl coenzyme A (AcCoA) as acetyl donor to modify an N ε-lysine residue of a protein. The enzymatic mechanism uses N ε-lysine acetyltransferases (KATs) to specifically transfer an acetyl group from AcCoA to N ε-lysine residues on proteins. To date, only one KAT (YfiQ, also known as Pka and PatZ) has been identified in Escherichia coli . Here, we demonstrate the existence of 4 additional E. coli KATs: RimI, YiaC, YjaB, and PhnO. In a genetic background devoid of all known acetylation mechanisms (most notably AcP and YfiQ) and one deacetylase (CobB), overexpression of these putative KATs elicited unique patterns of protein acetylation. We mutated key active site residues and found that most of them eliminated enzymatic acetylation activity. We used mass spectrometry to identify and quantify the specificity of YfiQ and the four novel KATs. Surprisingly, our analysis revealed a high degree of substrate specificity. The overlap between KAT-dependent and AcP-dependent acetylation was extremely limited, supporting the hypothesis that these two acetylation mechanisms play distinct roles in the posttranslational modification of bacterial proteins. We further showed that these novel KATs are conserved across broad swaths of bacterial phylogeny. Finally, we determined that one of the novel KATs (YiaC) and the known KAT (YfiQ) can negatively regulate bacterial migration. Together, these results emphasize distinct and specific nonenzymatic and enzymatic protein acetylation mechanisms present in bacteria. IMPORTANCE N ε-Lysine acetylation is one of the most abundant and important posttranslational modifications across all domains of life. One of the best-studied effects of acetylation occurs in eukaryotes, where acetylation of histone tails activates gene transcription. Although bacteria do not have true histones, N ε-lysine acetylation is prevalent; however, the role of these modifications is mostly unknown. We constructed an E. coli strain that lacked both known acetylation mechanisms to identify four new N ε-lysine acetyltransferases (RimI, YiaC, YjaB, and PhnO). We used mass spectrometry to determine the substrate specificity of these acetyltransferases. Structural analysis of selected substrate proteins revealed site-specific preferences for enzymatic acetylation that had little overlap with the preferences of the previously reported acetyl-phosphate nonenzymatic acetylation mechanism. Finally, YiaC and YfiQ appear to regulate flagellum-based motility, a phenotype critical for pathogenesis of many organisms. These acetyltransferases are highly conserved and reveal deeper and more complex roles for bacterial posttranslational modification.

Journal Article

Share this book

Add to My Shelf

Discovery and revision of Arabidopsis genes by proteogenomics

by Stanke, Mario , Castellana, Natalie E , Shen, Zhouxin in Amino acid sequence , amino acid sequences , Amino acids

2008

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.

Journal Article

Share this book

Add to My Shelf

Inhibition of interleukin-1 receptor-associated kinase-1 is a therapeutic strategy for acute myeloid leukemia subtypes

by Davare, Monika A , Elferich, Johannes , Hosseini, Mona M in Abnormalities , Acute myeloid leukemia , Computer applications

2018

Interleukin-1 receptor-associated kinase 1 (IRAK1), an essential mediator of innate immunity and inflammatory responses, is constitutively active in multiple cancers. We evaluated the role of IRAK1 in acute myeloid leukemia (AML) and assessed the inhibitory activity of multikinase inhibitor pacritinib on IRAK1 in AML. We demonstrated that IRAK1 is overexpressed in AML and provides a survival signal to AML cells. Genetic knockdown of IRAK1 in primary AML samples and xenograft model showed a significant reduction in leukemia burden. Kinase profiling indicated pacritinib has potent inhibitory activity against IRAK1. Computational modeling combined with site-directed mutagenesis demonstrated high-affinity binding of pacritinib to the IRAK1 kinase domain. Pacritinib exposure reduced IRAK1 phosphorylation in AML cells. A higher percentage of primary AML samples showed robust sensitivity to pacritinib, which inhibits FLT3, JAK2, and IRAK1, relative to FLT3 inhibitor quizartinib or JAK1/2 inhibitor ruxolitinib, demonstrating the importance of IRAK1 inhibition. Pacritinib inhibited the growth of AML cells harboring a variety of genetic abnormalities not limited to FLT3 and JAK2. Pacritinib treatment reduced AML progenitors in vitro and the leukemia burden in AML xenograft model. Overall, IRAK1 contributes to the survival of leukemic cells, and the suppression of IRAK1 may be beneficial among heterogeneous AML subtypes.

Journal Article

Share this book

Add to My Shelf

Ancient Regulatory Role of Lysine Acetylation in Central Metabolism

by Nakayasu, Ernesto S. , Plutz, Matthew J. , Shukla, Anil K. in Acetylation , acetylphosphate , Active sites

2017

Lysine acetylation is a common protein post-translational modification in bacteria and eukaryotes. Unlike phosphorylation, whose functional role in signaling has been established, it is unclear what regulatory mechanism acetylation plays and whether it is conserved across evolution. By performing a proteomic analysis of 48 phylogenetically distant bacteria, we discovered conserved acetylation sites on catalytically essential lysine residues that are invariant throughout evolution. Lysine acetylation removes the residue’s charge and changes the shape of the pocket required for substrate or cofactor binding. Two-thirds of glycolytic and tricarboxylic acid (TCA) cycle enzymes are acetylated at these critical sites. Our data suggest that acetylation may play a direct role in metabolic regulation by switching off enzyme activity. We propose that protein acetylation is an ancient and widespread mechanism of protein activity regulation. IMPORTANCE Post-translational modifications can regulate the activity and localization of proteins inside the cell. Similar to phosphorylation, lysine acetylation is present in both eukaryotes and prokaryotes and modifies hundreds to thousands of proteins in cells. However, how lysine acetylation regulates protein function and whether such a mechanism is evolutionarily conserved is still poorly understood. Here, we investigated evolutionary and functional aspects of lysine acetylation by searching for acetylated lysines in a comprehensive proteomic data set from 48 phylogenetically distant bacteria. We found that lysine acetylation occurs in evolutionarily conserved lysine residues in catalytic sites of enzymes involved in central carbon metabolism. Moreover, this modification inhibits enzymatic activity. Our observations suggest that lysine acetylation is an evolutionarily conserved mechanism of controlling central metabolic activity by directly blocking enzyme active sites. Post-translational modifications can regulate the activity and localization of proteins inside the cell. Similar to phosphorylation, lysine acetylation is present in both eukaryotes and prokaryotes and modifies hundreds to thousands of proteins in cells. However, how lysine acetylation regulates protein function and whether such a mechanism is evolutionarily conserved is still poorly understood. Here, we investigated evolutionary and functional aspects of lysine acetylation by searching for acetylated lysines in a comprehensive proteomic data set from 48 phylogenetically distant bacteria. We found that lysine acetylation occurs in evolutionarily conserved lysine residues in catalytic sites of enzymes involved in central carbon metabolism. Moreover, this modification inhibits enzymatic activity. Our observations suggest that lysine acetylation is an evolutionarily conserved mechanism of controlling central metabolic activity by directly blocking enzyme active sites.

Journal Article

Share this book

Add to My Shelf

Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study

by Smith, Richard D. , Venter, Eli , Payne, Samuel H. in Abnormalities , Amino acids , Analysis

2011

Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.

Journal Article

Share this book

Add to My Shelf

Regulation of bacterial stringent response by an evolutionarily conserved ribosomal protein L11 methylation

by Farris, Yuliya , Nakayasu, Ernesto S. , Burnet, Meagan C. in Amino acids , arginine methylation , Bacteria

2024

Protein methylation in bacteria was first identified over 60 years ago. Since then, its functional role has been identified for only a few proteins. To better understand the functional role of methylation in bacteria, we analyzed a large phyloproteomics data set encompassing 48 diverse bacteria. Our analysis revealed that ribosomal proteins are often methylated at conserved residues, suggesting that methylation of these sites may have a functional role in translation. Further analysis revealed that methylation of ribosomal protein L11 is important for stringent response signaling and ribosomal homeostasis.

Journal Article

Share this book

Add to My Shelf

A proteomic meta-analysis refinement of plasma extracellular vesicles

by Nakayasu, Ernesto S. , Sims, Emily K. , Huang, Fei in 631/337/475 , 631/80/313 , 692/53

2023

Extracellular vesicles play major roles in cell-to-cell communication and are excellent biomarker candidates. However, studying plasma extracellular vesicles is challenging due to contaminants. Here, we performed a proteomics meta-analysis of public data to refine the plasma EV composition by separating EV proteins and contaminants into different clusters. We obtained two clusters with a total of 1717 proteins that were depleted of known contaminants and enriched in EV markers with independently validated 71% true-positive. These clusters had 133 clusters of differentiation (CD) antigens and were enriched with proteins from cell-to-cell communication and signaling. We compared our data with the proteins deposited in PeptideAtlas, making our refined EV protein list a resource for mechanistic and biomarker studies. As a use case example for this resource, we validated the type 1 diabetes biomarker proplatelet basic protein in EVs and showed that it regulates apoptosis of β cells and macrophages, two key players in the disease development. Our approach provides a refinement of the EV composition and a resource for the scientific community.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter