Catalogue Search | MBRL

Normalization and missing value imputation for label-free LC-MS analysis

by Karpievitch, Yuliya V , Smith, Richard D , Dabney, Alan R in Algorithms , Bias , Bioinformatics

2012

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

Journal Article

Share this book

Add to My Shelf

LIQUID CHROMATOGRAPHY MASS SPECTROMETRY-BASED PROTEOMICS: BIOLOGICAL AND TECHNOLOGICAL ASPECTS

by Polpitiya, Ashoka D. , Dabney, Alan R. , Anderson, Gordon A. in 60 APPLIED LIFE SCIENCES , ABUNDANCE , Amino acids

2010

Mass spectrometry-based proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. Though recent years have seen a tremendous improvement in instrument performance and the computational tools used, significant challenges remain, and there are many opportunities for statisticians to make important contributions. In the most widely used \"bottom-up\" approach to proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and analyzed using a mass spectrometer. The two fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (1) Identifying the proteins that are present in a sample, and (2) Quantifying the abundance levels of the identified proteins. Both of these challenges require knowledge of the biological and technological context that gives rise to observed data, as well as the application of sound statistical principles for estimation and inference. We present an overview of bottom-up proteomics and outline the key statistical issues that arise in protein identification and quantification.

Journal Article

Share this book

Add to My Shelf

An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF

by Dabney, Alan R. , Karpievitch, Yuliya V. , Hill, Elizabeth G. in Algorithms , Alzheimer's disease , Alzheimers disease

2009

Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license.

Journal Article

Share this book

Add to My Shelf

Optimality Driven Nearest Centroid Classification from Genomic Data

by Storey, John D. , Dabney, Alan R. in Algorithms , Analysis , Bioinformatics

2007

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.

Journal Article

Share this book

Add to My Shelf

The Relationship Between a Student’s Success in First-Semester General Chemistry and Their Mathematics Fluency, Profile, and Performance on Common Questions

by Williamson, Vickie M. , Chuu, Eric , Dabney, Alan R. in At Risk Students , Chemistry , Colleges & universities

2022

In an effort to investigate the factors that lead to success in general chemistry, the Math-Up Skills Test (MUST) and common questions were used along with a student characteristic questionnaire. The MUST is a 20-item instrument to measure mathematics fluency, which is done without a calculator with a 15-min time limit. It has been shown as a valid predictor of successful grades in general chemistry I and II (grades of A, B, or C). A large amount of data was collected from 1020 general chemistry students from six southwestern universities, including MUST score, demographic questions, common examination questions, and course performance as measured by final exams and course grades. The common questions were drawn from databases that had established statistics, reliability, and validity. Six topics were chosen for the first-semester common questions: the combined gas laws, frequency and wavelength of light, unit conversions, stoichiometry, enthalpy, and limiting reagents. Two questions were selected for each topic, one algorithmic (mathematical) and one conceptual. Relationships among the variables were investigated by statistical analysis to generate linear and logistic regression models to predict student success. An interesting finding was the strong relationship between the average course grade and number of common questions answered correctly. The predictability of identifying at-risk students was analyzed for the MUST and the common questions. Respective correlations with the course grade were established. The study concluded that the common questions were the better predictor of success but that the MUST can more effectively be used to predict class performance because it can be given as a single-use test early in the semester.

Journal Article

Share this book

Add to My Shelf

Efficacy of the chemical trifluoromethanesulfonamide as a male gametocide in field-grown sorghum Sorghum bicolor (L.) Moench

by Hlavinka, Kyle B , Hodnett, George L , Boerman, Nicholas A in Cross-pollination , Cytoplasmic male sterility , Dosage

2019

Sorghum bicolor (L. Moench) is a cereal grain and forage crop that is grown across tropical and temperate regions of the world. Sorghum has a complete flower, resulting in self-pollination as the primary form of reproduction, but it is also grown commercially as a hybrid. Consequently, methods of cross-pollination for both breeding and hybrid seed production are important. In sorghum breeding, current methods of cross-pollination are effective, but they have limitations in terms of achieving complete and temporal male sterility. With the development of new breeding approaches, such as doubled haploids, temporal male sterility is essential. Temporal male sterility would also be useful in testing new seed parent lines prior to an investment in sterilization of the line in the cytoplasmic male sterility system. The objective of this study was to evaluate the efficacy of trifluoromethanesulfonamide (TFMSA) as a sorghum male gametocide under field conditions. TFMSA was foliarly applied to three male-fertile parental lines, in two environments, using a pipette and a sprayer, respectively, in dosages ranging from 5 to 30 mg/plant. Repeated applications over time for the 10 and 15 mg dosage rates were conducted on a subset of individual plants. The results indicate that once a minimum dosage threshold (between 10 and 15 mg) was reached, panicles became male sterile. Additional dosages and number of applications had little overall effect, and both hand-applied and sprayer-applied TFMSA had similar male sterility induction capability. From these studies, it appears that TFMSA can be used as an effective chemical male gametocide on sorghum under field conditions.

Journal Article

Share this book

Add to My Shelf

RNA-seq of serial kidney biopsies obtained during progression of chronic kidney disease from dogs with X-linked hereditary nephropathy

by Hokamp, Jessica A. , Dabney, Alan R. , Nabity, Mary B. in 45/90 , 631/208/212/2019 , 631/443/272

2017

Dogs with X-linked hereditary nephropathy (XLHN) have a glomerular basement membrane defect that leads to progressive juvenile-onset renal failure. Their disease is analogous to Alport syndrome in humans, and they also serve as a good model of progressive chronic kidney disease (CKD). However, the gene expression profile that affects progression in this disease has only been partially characterized. To help fill this gap, we used RNA sequencing to identify differentially expressed genes (DEGs), over-represented pathways, and upstream regulators that contribute to kidney disease progression. Total RNA from kidney biopsies was isolated at 3 clinical time points from 3 males with rapidly-progressing CKD, 3 males with slowly-progressing CKD, and 2 age-matched controls. We identified 70 DEGs by comparing rapid and slow groups at specific time points. Based on time course analysis, 1,947 DEGs were identified over the 3 time points revealing upregulation of inflammatory pathways: integrin signaling, T cell activation, and chemokine and cytokine signaling pathways. T cell infiltration was verified by immunohistochemistry. TGF-β1 was identified as the primary upstream regulator. These results provide new insights into the underlying molecular mechanisms of disease progression in XLHN, and the identified DEGs can be potential biomarkers and therapeutic targets translatable to all CKDs.

Journal Article

Share this book

Add to My Shelf

Author Correction: RNA-seq of serial kidney biopsies obtained during progression of chronic kidney disease from dogs with X-linked hereditary nephropathy

by Hokamp, Jessica A. , Dabney, Alan R. , Nabity, Mary B. in Author , Author Correction , Humanities and Social Sciences

2020

An amendment to this paper has been published and can be accessed via a link at the top of the paper.An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Journal Article

Share this book

Add to My Shelf

Predicting “Heart Age” Using Electrocardiography

by Schlegel, Todd , Starc, Vito , Feiveson, Alan in Precision medicine

2014

Knowledge of a patient’s cardiac age, or “heart age”, could prove useful to both patients and physicians for better encouraging lifestyle changes potentially beneficial for cardiovascular health. This may be particularly true for patients who exhibit symptoms but who test negative for cardiac pathology. We developed a statistical model, using a Bayesian approach, that predicts an individual’s heart age based on his/her electrocardiogram (ECG). The model is tailored to healthy individuals, with no known risk factors, who are at least 20 years old and for whom a resting ~5 min 12-lead ECG has been obtained. We evaluated the model using a database of ECGs from 776 such individuals. Secondarily, we also applied the model to other groups of individuals who had received 5-min ECGs, including 221 with risk factors for cardiac disease, 441 with overt cardiac disease diagnosed by clinical imaging tests, and a smaller group of highly endurance-trained athletes. Model-related heart age predictions in healthy non-athletes tended to center around body age, whereas about three-fourths of the subjects with risk factors and nearly all patients with proven heart diseases had higher predicted heart ages than true body ages. The model also predicted somewhat higher heart ages than body ages in a majority of highly endurance-trained athletes, potentially consistent with possible fibrotic or other anomalies recently noted in such individuals.

Journal Article

Share this book

Add to My Shelf

Issues in the mapping of two diseases

by Wakefield, Jon C , Dabney, Alan R in Aged , Bladder cancer , Cancer

2005

Recently, there has been increased interest in the geographical modelling of two or more diseases. In this article, we consider a number of issues relating to such an endeavour including the standardization process and the comparison of univariate and bivariate disease mapping models. A principle motivation for the examination of two or more diseases is to discover similarities or dissimilarities in the geographical distribution of risk. In this article, we propose a proportional mortality approach to give clues to areas of similarity and dissimilarity. A secondary aim of bivariate modelling is to ‘borrow strength’ between diseases in order to provide better estimates of risk in each area. We will illustrate various modelling strategies using incidence data from 1996 to 2000 on lung and bladder cancer in Washington state.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter