Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
7 result(s) for "histogram-valued data"
Sort by:
Convex clustering analysis for histogram-valued data
In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this article, we develop a regularized convex clustering approach for grouping histogram-valued data. The convex clustering is a relaxation of hierarchical clustering methods, where prototypes are grouped by having exactly the same value in each group via penalization of parameters. We apply two different distance metrics to measure (dis)similarity between histograms. Various numerical examples confirm that the proposed method shows better performance than other competitors.
A Novel Feature Representation and Clustering for Histogram-Valued Data
In an era where large-scale data are produced and collected rapidly, great interest is attributed to symbolic data analysis in order to explore connotative and significant information from massive data. Recently, novel statistical techniques for histogram-valued data have been proposed and widely applied in various fields where traditional methods are not suitable. However, existing research has to face challenges in modeling posed by the complicated expression and intrinsic constraints of histogram-valued data. In this work, we introduce a novel representation for a histogram, by means of capturing the location and shape information of the corresponding probability distribution. And on this basis, an effective graph clustering method is developed to partition multivariate histogram-valued data by learning a high-quality similarity matrix. Simulation experiments and empirical case analysis demonstrate the proposed method significantly facilitates the clustering effect for histogram-valued data and presents obvious advantages compared with competing approaches.
Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
Sampling Based Histogram PCA and Its Mapreduce Parallel Implementation on Multicore
In existing principle component analysis (PCA) methods for histogram-valued symbolic data, projection results are approximated based on Moore’s algebra and fail to reflect the data’s true structure, mainly because there is no precise, unified calculation method for the linear combination of histogram data. In this paper, we propose a new PCA method for histogram data that distinguishes itself from various well-established methods in that it can project observations onto the space spanned by principal components more accurately and rapidly by sampling through a MapReduce framework. The new histogram PCA method is implemented under the same assumption of “orthogonal dimensions for every observation” with the existing literatures. To project observations, the method first samples from the original histogram variables to acquire single-valued data, on which linear combination operations can be performed. Then, the projection of observations can be given by linear combination of loading vectors and single-valued samples, which is close to accurate projection results. Finally, the projection is summarized to histogram data. These procedures involve complex algorithms and large-scale data, which makes the new method time-consuming. To speed it up, we undertake a parallel implementation of the new method in a multicore MapReduce framework. A simulation study and an empirical study confirm that the new method is effective and time-saving.
Symbolic data analysis tools for recommendation systems
Recommender systems have become an important tool to cope with the information overload problem by acquiring data about user behavior. After tracing the user’s behavior, through actions or rates, computational recommender systems use information- filtering techniques to recommend items. In order to recommend new items, one of the three major approaches is generally adopted: content-based filtering, collaborative filtering, or hybrid filtering. This paper presents three information-filtering methods, each of them based on one of these approaches. In our methods, the user profile is built up through symbolic data structures and the user and item correlations are computed through dissimilarity functions adapted from the symbolic data analysis (SDA) domain. The use of SDA tools has improved the performance of recommender systems, particularly concerning the find good items task measured by the half-life utility metric, when there is not much information about the user.
Principal component histograms from interval-valued observations
The focus of this paper is to propose an approach to construct histogram values for the principal components of interval-valued observations. Le-Rademacher and Billard (J Comput Graph Stat 21:413–432, 2012 ) show that for a principal component analysis on interval-valued observations, the resulting observations in principal component space are polytopes formed by the convex hulls of linearly transformed vertices of the observed hyper-rectangles. In this paper, we propose an algorithm to translate these polytopes into histogram-valued data to provide numerical values for the principal components to be used as input in further analysis. Other existing methods of principal component analysis for interval-valued data construct the principal components, themselves, as intervals which implicitly assume that all values within an observation are uniformly distributed along the principal components axes. However, this assumption is only true in special cases where the variables in the dataset are mutually uncorrelated. Representation of the principal components as histogram values proposed herein more accurately reflects the variation in the internal structure of the observations in a principal component space. As a consequence, subsequent analyses using histogram-valued principal components as input result in improved accuracy.