Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
87 result(s) for "Peptide Mapping - statistics "
Sort by:
MALDI-TOF mass spectrometry on intact bacteria combined with a refined analysis framework allows accurate classification of MSSA and MRSA
Fast and reliable detection coupled with accurate data-processing and analysis of antibiotic-resistant bacteria is essential in clinical settings. In this study, we use MALDI-TOF on intact cells combined with a refined analysis framework to demonstrate discrimination between methicillin-susceptible (MSSA) and methicillin-resistant (MRSA) Staphylococcus aureus. By combining supervised and unsupervised machine learning methods, we firstly show that the mass spectroscopy data contains strong signal for the clustering of MSSA and MRSA. Then we concentrate on applying supervised learning to extract and verify the important features. A new workflow is proposed that allows for extracting a fixed set of reference peaks so that any new data can be aligned to it and hence consistent feature matrices can be obtained. Also note that by doing so we are able to examine the robustness of the important features that have been found. We also show that appropriate size of the benchmark data, appropriate alignment of the testing data and use of an optimal set of features via feature selection results in prediction accuracy over 90%. In summary, as proof-of-principle, our integrated experimental and bioinformatics study suggests a novel intact cell MALDI-TOF to be of great promise for fast and reliable detection of MRSA strains.
Relative, Label-free Protein Quantitation: Spectral Counting Error Statistics from Nine Replicate MudPIT Samples
Nine replicate samples of peptides from soybean leaves, each spiked with a different concentration of bovine apotransferrin peptides, were analyzed on a mass spectrometer using multidimensional protein identification technology (MudPIT). Proteins were detected from the peptide tandem mass spectra, and the numbers of spectra were statistically evaluated for variation between samples. The results corroborate prior knowledge that combining spectra from replicate samples increases the number of identifiable proteins and that a summed spectral count for a protein increases linearly with increasing molar amounts of protein. Furthermore, statistical analysis of spectral counts for proteins in two- and three-way comparisons between replicates and combined replicates revealed little significant variation arising from run-to-run differences or data-dependent instrument ion sampling that might falsely suggest differential protein accumulation. In these experiments, spectral counting was enabled by PANORAMICS, probability-based software that predicts proteins detected by sets of observed peptides. Three alternative approaches to counting spectra were also evaluated by comparison. As the counting thresholds were changed from weaker to more stringent, the accuracy of ratio determination also changed. These results suggest that thresholds for counting can be empirically set to improve relative quantitation. All together, the data confirm the accuracy and reliability of label-free spectral counting in the relative, quantitative analysis of proteins between samples. This report confirms the statistical accuracy and reliability of label-free spectral counting for dissecting subtle changes in relative protein accumulation above normal background noise.
Data Self-Recalibration and Mixture Mass Fingerprint Searching (DASER-MMF) to Enhance Protein Identification within Complex Mixtures
A novel algorithm based on Data Self-Recalibration and a subsequent Mixture Mass Fingerprint search (DASER-MMF) has been developed to improve the performance of protein identification from online 1D and 2D-LC-MS/MS experiments conducted on high-resolution mass spectrometers. Recalibration of 40% to 75% of the MS spectra in a human serum dataset is demonstrated with average errors of 0.3 ± 0.3 ppm, regardless of the original calibration quality. With simple protein mixtures, the MMF search identifies new proteins not found in the MS/MS based search and increases the sequence coverage for identified proteins by six times. The high mass accuracy allows proteins to be identified with as little as three peptide mass hits. When applied to very complex samples, the MMF search shows less dramatic performance improvements. However, refinements such as additional discriminating factors utilized within the search space provide significant gains in protein identification ability and indicate that further enhancements are possible in this realm. A novel algorithm based on data self-recalibration and a mixture mass fingerprint search (DASER-MMF) improves the performance of protein identification from LC-MS/MS experiments.
Flexsim-R: A virtual affinity fingerprint descriptor to calculate similarities of functional groups
Methods to describe the similarity of fragments occurring in drug-like molecules are of fundamental importance in computational drug design. In the early phase of lead discovery, they can help to select diverse building blocks for combinatorial compound libraries intended for broad screening. In lead optimization, such methods can guide bioisosteric replacements of one functional group by another or serve as descriptors for QSAR calculations. In this paper, we outline the development of a novel 3D descriptor, termed Flexsim-R, which is a further extension of our virtual affinity fingerprint idea. Descriptors are calculated based on docking of small fragments such as building blocks for combinatorial chemistry or functional groups of drug-like molecules into a reference panel of protein binding sites. The method is validated by examining the neighborhood behavior of the affinity fingerprints and by deriving predictive QSAR models for a couple of literature peptide data sets.
Mapping genomic loci implicates genes and synaptic biology in schizophrenia
Schizophrenia has a heritability of 60-80%1, much of which is attributable to common risk alleles. Here, in a two-stage genome-wide association study of up to 76,755 individuals with schizophrenia and 243,649 control individuals, we report common variant associations at 287 distinct genomic loci. Associations were concentrated in genes that are expressed in excitatory and inhibitory neurons of the central nervous system, but not in other tissues or cell types. Using fine-mapping and functional genomic data, we identify 120 genes (106 protein-coding) that are likely to underpin associations at some of these loci, including 16 genes with credible causal non-synonymous or untranslated region variation. We also implicate fundamental processes related to neuronal function, including synaptic organization, differentiation and transmission. Fine-mapped candidates were enriched for genes associated with rare disruptive coding variants in people with schizophrenia, including the glutamate receptor subunit GRIN2A and transcription factor SP4, and were also enriched for genes implicated by such variants in neurodevelopmental disorders. We identify biological processes relevant to schizophrenia pathophysiology; show convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders; and provide a resource of prioritized genes and variants to advance mechanistic studies.
Structure-based protein function prediction using graph convolutional networks
The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ . The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, the authors introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures.
Mapping allosteric communications within individual proteins
Allostery in proteins influences various biological processes such as regulation of gene transcription and activities of enzymes and cell signaling. Computational approaches for analysis of allosteric coupling provide inexpensive opportunities to predict mutations and to design small-molecule agents to control protein function and cellular activity. We develop a computationally efficient network-based method, Ohm, to identify and characterize allosteric communication networks within proteins. Unlike previously developed simulation-based approaches, Ohm relies solely on the structure of the protein of interest. We use Ohm to map allosteric networks in a dataset composed of 20 proteins experimentally identified to be allosterically regulated. Further, the Ohm allostery prediction for the protein CheY correlates well with NMR CHESCA studies. Our webserver, Ohm.dokhlab.org, automatically determines allosteric network architecture and identifies critical coupled residues within this network. The computational prediction of protein allostery can guide experimental studies of protein function and cellular activity. Here, the authors develop a network-based method to detect allosteric coupling within proteins solely based on their structures, and set up a webserver for allostery prediction.
Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses
The statistical concepts for false discovery rate control long applied in the field of data-dependent acquisition (DDA) mass spectrometry-based proteomics can be adapted for the emerging technique of data-independent acquisition (DIA) mass spectrometry. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.
Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences
Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas .