MbrlCatalogueTitleDetail

Do you wish to reserve the book?
Extending K-Means
Extending K-Means
Hey, we have placed the reservation for you!
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Extending K-Means
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Title added to your shelf!
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Extending K-Means
Extending K-Means

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
How would you like to get it?
We have requested the book for you! Sorry the robot delivery is not available at the moment
We have requested the book for you!
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Extending K-Means
Dissertation

Extending K-Means

2019
Request Book From Autostore and Choose the Collection Method
Overview
In the unsupervised learning setting, where data labels are not available and few constraints are put on data structure before analysis, having a robust procedure is paramount for any method tasked with analyzing that data. This dissertation presents three distinct papers, each of which provides a mechanic for extending the use cases of the K-means algorithm. Paper 1. Clustering is a difficult problem that is further challenged in higher dimensions where some of the information can be redundant. Such redundancy can be in the form of dimensions that have group information that is already present in other variables, or are simply irrelevant and contribute no useful information with regard to clustering. The K-means algorithm is arguably the most widely used clustering tool, but its performance is degraded by the presence of redundant dimensions. We provide a formal approach to identifying and removing these redundant features and demonstrate improved performance, as well as interpretability of the derived groupings. Our methodology also simultaneously estimates the number of groups while selecting dimensions informative for clustering. We evaluate performance on datasets simulated under many complexities and conditions as well as on a set of handwritten digits, and is used to identify different running styles among participants in a 100 km ultra-marathon race. Paper 2. The K-means algorithm is extended to allow for partitioning of skewed groups. Our algorithm is called TiK-Means and contributes a K-means type algorithm that assigns observations to groups while estimating their skewness-transformation parameters. The resulting groups and transformation reveal general-structured clusters that can be explained by inverting the estimated transformation. Further, a modification of the jump statistic chooses the number of groups. Our algorithm is evaluated on simulated and real- life datasets and then applied to a long-standing astronomical dispute regarding the distinct kinds of gamma ray bursts. Paper 3. This paper presents a method for processing handwritten documents and clustering components of the writing into groups based on structural attributes. The obtained cluster membership information is used to develop a statistical model for writer identification. The presented clustering algorithm creates a grouping structure for glyphs, which are small pieces of handwriting extracted using the handwriter R package developed by Berry. To facilitate the clustering of glyphs, a distance measure inspired by the graph edit distance and a method for calculating the center of a set of glyphs are both introduced. The clustering algorithm is applied to the MNIST dataset for demonstration and exploratory purposes. Various behaviors of the algorithm are explored using its relatively simple digit glyphs. We also establish a Bayesian hierarchical model for modeling a set of writers based on their propensity for writing glyphs that are assigned to certain clusters. We then perform a full scale writer identification analysis on handwritten documents from 27 writers in the Computer Vision Lab dataset.
Publisher
ProQuest Dissertations & Theses
Subject
ISBN
1392264081, 9781392264089