Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Optimally splitting cases for training and testing high dimensional classifiers
by
Simon, Richard M
, Dobbin, Kevin K
in
Algorithms
/ Analysis of Variance
/ Artificial Intelligence
/ Bioinformatic and algorithmical studies
/ Biomedical and Life Sciences
/ Biomedicine
/ Data Interpretation, Statistical
/ Gene Expression
/ Gene Expression Profiling
/ Human Genetics
/ Humans
/ Microarrays
/ Neoplasms - genetics
/ Oligonucleotide Array Sequence Analysis
/ Research Article
/ Statistics, Nonparametric
2011
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Optimally splitting cases for training and testing high dimensional classifiers
by
Simon, Richard M
, Dobbin, Kevin K
in
Algorithms
/ Analysis of Variance
/ Artificial Intelligence
/ Bioinformatic and algorithmical studies
/ Biomedical and Life Sciences
/ Biomedicine
/ Data Interpretation, Statistical
/ Gene Expression
/ Gene Expression Profiling
/ Human Genetics
/ Humans
/ Microarrays
/ Neoplasms - genetics
/ Oligonucleotide Array Sequence Analysis
/ Research Article
/ Statistics, Nonparametric
2011
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Optimally splitting cases for training and testing high dimensional classifiers
by
Simon, Richard M
, Dobbin, Kevin K
in
Algorithms
/ Analysis of Variance
/ Artificial Intelligence
/ Bioinformatic and algorithmical studies
/ Biomedical and Life Sciences
/ Biomedicine
/ Data Interpretation, Statistical
/ Gene Expression
/ Gene Expression Profiling
/ Human Genetics
/ Humans
/ Microarrays
/ Neoplasms - genetics
/ Oligonucleotide Array Sequence Analysis
/ Research Article
/ Statistics, Nonparametric
2011
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Optimally splitting cases for training and testing high dimensional classifiers
Journal Article
Optimally splitting cases for training and testing high dimensional classifiers
2011
Request Book From Autostore
and Choose the Collection Method
Overview
Background
We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?
Results
We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.
Conclusions
By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller
n
resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (
n
≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.
Publisher
BioMed Central,BioMed Central Ltd,BMC
MBRLCatalogueRelatedBooks
Related Items
Related Items
This website uses cookies to ensure you get the best experience on our website.