Catalogue Search | MBRL

A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training

by Yadav, Sumedh , Bode, Mathis in Accuracy , Algorithms , Approximation

2019

A scalable graphical method is presented for selecting and partitioning datasets for the training phase of a classification task. For the heuristic, a clustering algorithm is required to get its computation cost in a reasonable proportion to the task itself. This step is succeeded by construction of an information graph of the underlying classification patterns using approximate nearest neighbor methods. The presented method consists of two approaches, one for reducing a given training set, and another for partitioning the selected/reduced set. The heuristic targets large datasets, since the primary goal is a significant reduction in training computation run-time without compromising prediction accuracy. Test results show that both approaches significantly speed-up the training task when compared against that of state-of-the-art shrinking heuristics available in LIBSVM. Furthermore, the approaches closely follow or even outperform in prediction accuracy. A network design is also presented for a partitioning based distributed training formulation. Added speed-up in training run-time is observed when compared to that of serial implementation of the approaches.

Journal Article

Share this book

Add to My Shelf

An Efficient Partition of Training Data Set Improves Speed and Accuracy of Cascade-correlation Algorithm

by Tetko, Igor V. , Villa, Alessandro E.P. in Algorithms , Artificial neural networks , Datasets

1997

This study extends an application of efficient partition algorithm (EPA) for artificial neural network ensemble trained according to Cascade Correlation Algorithm. We show that EPA allows to decrease the number of cases in learning and validated data sets. The predictive ability of the ensemble calculated using the whole data set is not affected and in some cases it is even improved. It is shown that a distribution of cases selected by this method is proportional to the second derivative of the analyzed function.

Journal Article

Share this book

Add to My Shelf

SPXYE: an improved method for partitioning training and validation sets

by Jia, Zhizhen , Fang, Chao , Gao, Ting in Algorithms , Calibration , Computer Communication Networks

2019

This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independent variables), Y (the dependent variables), and E (the error of the preliminarily calculated results with the dependent variables) spaces. This selection strategy provided a valuable tool for multivariate calibration. The proposed technique SPXYE was applied to three household chemical molecular databases to obtain training and validation sets for partial least squares (PLS) modeling. For comparison, the training and validation sets were also generated using random sampling, Kennard–Stone, and sample set partitioning based on joint X–Y distances methods. The predictions of all associated PLS regression models were performed upon the same testing set, which was different from either the training set or the validation set. The results indicated that the proposed SPXYE strategy might serve as an alternative partition strategy.

Journal Article

Share this book

Add to My Shelf

Voting over Multiple Condensed Nearest Neighbors

by Alpaydin, Ethem in Artificial intelligence , Bootstrap method , Bootstrapping

1997

Lazy learning methods like the k-nearest neighbor classifier require storing the whole training set and may be too costly when this set is large. The condensed nearest neighbor classifier incrementally stores a subset of the sample, thus decreasing storage and computation requirements. We propose to train multiple such subsets and take a vote over them, thus combining predictions from a set of concept descriptions. We investigate two voting schemes: simple voting where voters have equal weight and weighted voting where weights depend on classifiers' confidences in their predictions. We consider ways to form such subsets for improved performance: When the training set is small, voting improves performance considerably. If the training set is not small, then voters converge to similar solutions and we do not gain anything by voting. To alleviate this, when the training set is of intermediate size, we use bootstrapping to generate smaller training sets over which we train the voters. When the training set is large, we partition it into smaller, mutually exclusive subsets and then train the voters. Simulation results on six datasets are reported with good results. We give a review of methods for combining multiple learners. The idea of taking a vote over multiple learners can be applied with any type of learning scheme.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter