Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Extending K-Means
by
Berry, Nicholas S
in
Statistics
2019
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Do you wish to request the book?
Extending K-Means
by
Berry, Nicholas S
in
Statistics
2019
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Dissertation
Extending K-Means
2019
Request Book From Autostore
and Choose the Collection Method
Overview
In the unsupervised learning setting, where data labels are not available and few constraints are put on data structure before analysis, having a robust procedure is paramount for any method tasked with analyzing that data. This dissertation presents three distinct papers, each of which provides a mechanic for extending the use cases of the K-means algorithm. Paper 1. Clustering is a difficult problem that is further challenged in higher dimensions where some of the information can be redundant. Such redundancy can be in the form of dimensions that have group information that is already present in other variables, or are simply irrelevant and contribute no useful information with regard to clustering. The K-means algorithm is arguably the most widely used clustering tool, but its performance is degraded by the presence of redundant dimensions. We provide a formal approach to identifying and removing these redundant features and demonstrate improved performance, as well as interpretability of the derived groupings. Our methodology also simultaneously estimates the number of groups while selecting dimensions informative for clustering. We evaluate performance on datasets simulated under many complexities and conditions as well as on a set of handwritten digits, and is used to identify different running styles among participants in a 100 km ultra-marathon race. Paper 2. The K-means algorithm is extended to allow for partitioning of skewed groups. Our algorithm is called TiK-Means and contributes a K-means type algorithm that assigns observations to groups while estimating their skewness-transformation parameters. The resulting groups and transformation reveal general-structured clusters that can be explained by inverting the estimated transformation. Further, a modification of the jump statistic chooses the number of groups. Our algorithm is evaluated on simulated and real- life datasets and then applied to a long-standing astronomical dispute regarding the distinct kinds of gamma ray bursts. Paper 3. This paper presents a method for processing handwritten documents and clustering components of the writing into groups based on structural attributes. The obtained cluster membership information is used to develop a statistical model for writer identification. The presented clustering algorithm creates a grouping structure for glyphs, which are small pieces of handwriting extracted using the handwriter R package developed by Berry. To facilitate the clustering of glyphs, a distance measure inspired by the graph edit distance and a method for calculating the center of a set of glyphs are both introduced. The clustering algorithm is applied to the MNIST dataset for demonstration and exploratory purposes. Various behaviors of the algorithm are explored using its relatively simple digit glyphs. We also establish a Bayesian hierarchical model for modeling a set of writers based on their propensity for writing glyphs that are assigned to certain clusters. We then perform a full scale writer identification analysis on handwritten documents from 27 writers in the Computer Vision Lab dataset.
This website uses cookies to ensure you get the best experience on our website.