Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

by Dunbrack, Roland L. , Wei, Qiong

in Accuracy / Algorithms / Analysis / Animals / Artificial Intelligence / Bioinformatics / Biology / Cancer / Classifiers / Computational Biology - methods / Computer Science / Correlation coefficient / Correlation coefficients / Data points / Databases, Genetic / Datasets / Genetic Association Studies / Genomes / Genomics / Genotype & phenotype / Humans / Learning algorithms / Machine learning / Mathematical models / Medical research / Missense mutation / Models, Biological / Mutation / Mutation, Missense / Oversampling / Phenotype / Polymorphism, Genetic / Proteins / Reproducibility of Results / Social and Behavioral Sciences / Teaching methods / Training

2013

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

by Dunbrack, Roland L. , Wei, Qiong

2013

Confirm

Do you wish to request the book?

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

by Dunbrack, Roland L. , Wei, Qiong

2013

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

Dunbrack, Roland L.,

Wei, Qiong

2013

Overview

Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.

Share this book

Add to My Shelf

Publisher

Public Library of Science,Public Library of Science (PLoS)

Subject

Accuracy

/ Algorithms

/ Analysis

/ Animals

/ Artificial Intelligence

/ Bioinformatics

/ Biology