Catalogue Search | MBRL

Convex clustering analysis for histogram-valued data

by Wang, Yanning , Yoon, Young Joo , Park, Cheolwoo in Artificial Intelligence , biometry , Cluster Analysis

2019

In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this article, we develop a regularized convex clustering approach for grouping histogram-valued data. The convex clustering is a relaxation of hierarchical clustering methods, where prototypes are grouped by having exactly the same value in each group via penalization of parameters. We apply two different distance metrics to measure (dis)similarity between histograms. Various numerical examples confirm that the proposed method shows better performance than other competitors.

Journal Article

Share this book

Add to My Shelf

A Novel Feature Representation and Clustering for Histogram-Valued Data

by Zhao, Qing , Wang, Huiwen in Algorithms , Analysis , Clustering

2025

In an era where large-scale data are produced and collected rapidly, great interest is attributed to symbolic data analysis in order to explore connotative and significant information from massive data. Recently, novel statistical techniques for histogram-valued data have been proposed and widely applied in various fields where traditional methods are not suitable. However, existing research has to face challenges in modeling posed by the complicated expression and intrinsic constraints of histogram-valued data. In this work, we introduce a novel representation for a histogram, by means of capturing the location and shape information of the corresponding probability distribution. And on this basis, an effective graph clustering method is developed to partition multivariate histogram-valued data by learning a high-quality similarity matrix. Simulation experiments and empirical case analysis demonstrate the proposed method significantly facilitates the clustering effect for histogram-valued data and presents obvious advantages compared with competing approaches.

Journal Article

Share this book

Add to My Shelf

Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering

by Umbleja, Kadri , Ichino, Manabu , Yaguchi, Hiroyuki in Clustering , compactness , Data models

2021

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.

Journal Article

Share this book

Add to My Shelf

Sampling Based Histogram PCA and Its Mapreduce Parallel Implementation on Multicore

by Diday, Edwin , Wang, Cheng , Wang, Huiwen in Approximation , Computer simulation , Empirical analysis

2018

In existing principle component analysis (PCA) methods for histogram-valued symbolic data, projection results are approximated based on Moore’s algebra and fail to reflect the data’s true structure, mainly because there is no precise, unified calculation method for the linear combination of histogram data. In this paper, we propose a new PCA method for histogram data that distinguishes itself from various well-established methods in that it can project observations onto the space spanned by principal components more accurately and rapidly by sampling through a MapReduce framework. The new histogram PCA method is implemented under the same assumption of “orthogonal dimensions for every observation” with the existing literatures. To project observations, the method first samples from the original histogram variables to acquire single-valued data, on which linear combination operations can be performed. Then, the projection of observations can be given by linear combination of loading vectors and single-valued samples, which is close to accurate projection results. Finally, the projection is summarized to histogram data. These procedures involve complex algorithms and large-scale data, which makes the new method time-consuming. To speed it up, we undertake a parallel implementation of the new method in a multicore MapReduce framework. A simulation study and an empirical study confirm that the new method is effective and time-saving.

Journal Article

Share this book

Add to My Shelf

Symbolic data analysis tools for recommendation systems

by Leite Dantas Bezerra, Byron , Tenorio de Carvalho, Francisco de Assis in Applied sciences , Case studies , Collaboration

2011

Recommender systems have become an important tool to cope with the information overload problem by acquiring data about user behavior. After tracing the user’s behavior, through actions or rates, computational recommender systems use information- filtering techniques to recommend items. In order to recommend new items, one of the three major approaches is generally adopted: content-based filtering, collaborative filtering, or hybrid filtering. This paper presents three information-filtering methods, each of them based on one of these approaches. In our methods, the user profile is built up through symbolic data structures and the user and item correlations are computed through dissimilarity functions adapted from the symbolic data analysis (SDA) domain. The use of SDA tools has improved the performance of recommender systems, particularly concerning the find good items task measured by the half-life utility metric, when there is not much information about the user.

Journal Article

Share this book

Add to My Shelf

Unsupervised Feature Selection for Histogram-Valued Symbolic Data by Hierarchical Conceptual Clustering

by Manabu Ichino , Kadri Umbleja , Hiroyuki Yaguchi in algebra_number_theory , compactness , HA1-4737

2021

Journal Article

Share this book

Add to My Shelf

Principal component histograms from interval-valued observations

by Le-Rademacher, J. , Billard, L. in Computational mathematics , Construction , Economic Theory/Quantitative Economics/Mathematical Methods

2013

The focus of this paper is to propose an approach to construct histogram values for the principal components of interval-valued observations. Le-Rademacher and Billard (J Comput Graph Stat 21:413–432, 2012 ) show that for a principal component analysis on interval-valued observations, the resulting observations in principal component space are polytopes formed by the convex hulls of linearly transformed vertices of the observed hyper-rectangles. In this paper, we propose an algorithm to translate these polytopes into histogram-valued data to provide numerical values for the principal components to be used as input in further analysis. Other existing methods of principal component analysis for interval-valued data construct the principal components, themselves, as intervals which implicitly assume that all values within an observation are uniformly distributed along the principal components axes. However, this assumption is only true in special cases where the variables in the dataset are mutually uncorrelated. Representation of the principal components as histogram values proposed herein more accurately reflects the variation in the internal structure of the observations in a principal component space. As a consequence, subsequent analyses using histogram-valued principal components as input result in improved accuracy.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter