Catalogue Search | MBRL

by SHI, Yifan , MA, Qianli , DONG, Xibin in Algorithms , clustering ensemble , Computer Science

2020

Despite significant successes achieved in knowledge discovery, traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc. The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data. In this context, it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model. Ensemble learning, as one research hot spot, aims to integrate data fusion, data modeling, and data mining into a unified framework. Specifically, ensemble learning firstly extracts a set of features with a variety of transformations. Based on these learned features, multiple learning algorithms are utilized to produce weak predictive results. Finally, ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way. In this paper, we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics. In addition, we present challenges and possible research directions for each mainstream approach of ensemble learning, and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning, reinforcement learning, etc.

Journal Article

Share this book

Add to My Shelf

Elite fuzzy clustering ensemble based on clustering diversity and quality measures

by Hossinzadeh, Mehdi , Minaei-Bidgoli, Behrooz , Parvin, Hamid in Clustering , Quality , State of the art

2019

In spite of some attempts at improving the quality of the clustering ensemble methods, it seems that little research has been devoted to the selection procedure within the fuzzy clustering ensemble. In addition, quality and local diversity of base-clusterings are two important factors in the selection of base-clusterings. Very few of the studies have considered these two factors together for selecting the best fuzzy base-clusterings in the ensemble. We propose a novel fuzzy clustering ensemble framework based on a new fuzzy diversity measure and a fuzzy quality measure to find the base-clusterings with the best performance. Diversity and quality are defined based on the fuzzy normalized mutual information between fuzzy base-clusterings. In our framework, the final clustering of selected base-clusterings is obtained by two types of consensus functions: (1) a fuzzy co-association matrix is constructed from the selected base-clusterings and then, a single traditional clustering such as hierarchical agglomerative clustering is applied as consensus function over the matrix to construct the final clustering. (2) a new graph based fuzzy consensus function. The time complexity of the proposed consensus function is linear in terms of the number of data-objects. Experimental results reveal the effectiveness of the proposed approach compared to the state-of-the-art methods in terms of evaluation criteria on various standard datasets.

Journal Article

Share this book

Add to My Shelf

Examining unsupervised ensemble learning using spectroscopy data of organic compounds

by Massena, Djenerly G , He, Kedan in Algorithms , Bagging , Clustering

2023

One solution to the challenge of choosing an appropriate clustering algorithm is to combine different clusterings into a single consensus clustering result, known as cluster ensemble (CE). This ensemble learning strategy can provide more robust and stable solutions across different domains and datasets. Unfortunately, not all clusterings in the ensemble contribute to the final data partition. Cluster ensemble selection (CES) aims at selecting a subset from a large library of clustering solutions to form a smaller cluster ensemble that performs as well as or better than the set of all available clustering solutions. In this paper, we investigate four CES methods for the categorization of structurally distinct organic compounds using high-dimensional IR and Raman spectroscopy data. Single quality selection (SQI) forms a subset of the ensemble by selecting the highest quality ensemble members. The Single Quality Selection (SQI) method is used with various quality indices to select subsets by including the highest quality ensemble members. The Bagging method, usually applied in supervised learning, ranks ensemble members by calculating the normalized mutual information (NMI) between ensemble members and consensus solutions generated from a randomly sampled subset of the full ensemble. The hierarchical cluster and select method (HCAS-SQI) uses the diversity matrix of ensemble members to select a diverse set of ensemble members with the highest quality. Furthermore, a combining strategy can be used to combine subsets selected using multiple quality indices (HCAS-MQI) for the refinement of clustering solutions in the ensemble. The IR + Raman hybrid ensemble library is created by merging two complementary “views” of the organic compounds. This inherently more diverse library gives the best full ensemble consensus results. Overall, the Bagging method is recommended because it provides the most robust results that are better than or comparable to the full ensemble consensus solutions.

Journal Article

Share this book

Add to My Shelf

Quantum Clustering Ensemble

by Deng, Ping , Tian, Peizhou , Jia, Shuang in Base clustering , Clustering ensemble , Quantum clustering ensemble

2021

Clustering ensemble combines several base clustering results into a definitive clustering solution which has better robustness, accuracy, and stability, and it can also be used in knowledge reuse, distributed computing, and privacy preservation. In this paper, we propose a novel quantum clustering ensemble (QCE) technique derived from quantum mechanics. The idea is that basic labels are associated with a vector in Hilbert space, and a scale-space probability function can be constructed for clustering ensemble. In detail, an operator in Hilbert space is represented by the Schrodinger equation of the probability function as a solution. Firstly, the base clustering results are regarded as new features of the original dataset, and they can be transformed into Hilbert space as vectors. Secondly, a QCE model is designed and the corresponding objective function is illustrated in detail. Furthermore, the objective function is inferred and optimized to obtain the minimum result, which is then used to determine the centers. At last, 5 base clustering algorithms and 5 clustering ensemble algorithms are tested on 12 several datasets for comparing experiments, and the experimental results show that the QCE is very competitive and outperforms the state of the art algorithms.

Journal Article

Share this book

Add to My Shelf

Accelerating Infinite Ensemble of Clustering by Pivot Features

by Hussain, Amir , Jin, Xiao-Bo , Xie, Guo-Sen in Algorithms , Artificial Intelligence , Biomedical and Life Sciences

2018

The infinite ensemble clustering (IEC) incorporates both ensemble clustering and representation learning by fusing infinite basic partitions and shows appealing performance in the unsupervised context. However, it needs to solve the linear equation system with the high time complexity in proportion to O ( d 3 ) where d is the concatenated dimension of many clustering results. Inspired by the cognitive characteristic of human memory that can pay attention to the pivot features in a more compressed data space, we propose an acceleration version of IEC (AIEC) by extracting the pivot features and learning the multiple mappings to reconstruct them, where the linear equation system can be solved with the time complexity O ( d r 2 ) ( r ≪ d ). Experimental results on the standard datasets including image and text ones show that our algorithm AIEC improves the running time of IEC greatly but achieves the comparable clustering performance.

Journal Article

Share this book

Add to My Shelf

Incomplete multi-view clustering with multiple imputation and ensemble clustering

by Wang, Songtao , Li, Chunshan , Chu, Dianhui in Algorithms , Clustering , Data mining

2022

Multi-view clustering is an important and challenging task in machine learning and data mining. In the past decade, this topic attracted much attention and there have been many progress achieved in this field. However, in reality, due to different factors such as machine error, sensor failure, multi-view data are mostly incomplete, thus how to deal with this problem becomes a challenge. Some existing works mainly deal with view missing case, which means in certain view of datasets, the whole features of some samples would be lost. In fact, missing value can occur in any position, that is, any value missing case. In that case, there would be some values missed in any view with sheerly random way. We proposed a two-stage algorithm involved multiple imputation and ensemble clustering to deal with multi-view clustering in any value missing case. Multiple imputation is adopted to deal with missing values problem and weighted ensemble clustering is applied to implement multi-view clustering. The experimental comparison on several data sets verified the effectiveness of the proposed method.

Journal Article

Share this book

Add to My Shelf

Ensemble-based community detection in multilayer networks

by Tagarelli, Andrea , Amelio, Alessia , Gullo, Francesco in Artificial Intelligence , Chemistry and Earth Sciences , Clustering

2017

The problem of community detection in a multilayer network can effectively be addressed by aggregating the community structures separately generated for each network layer, in order to infer a consensus solution for the input network. To this purpose, clustering ensemble methods developed in the data clustering field are naturally of great support. Bringing these methods into a community detection framework would in principle represent a powerful and versatile approach to reach more stable and reliable community structures. Surprisingly, research on consensus community detection is still in its infancy. In this paper, we propose a novel modularity-driven ensemble-based approach to multilayer community detection. A key aspect is that it finds consensus community structures that not only capture prototypical community memberships of nodes, but also preserve the multilayer topology information and optimize the edge connectivity in the consensus via modularity analysis. Empirical evidence obtained on seven real-world multilayer networks sheds light on the effectiveness and efficiency of our proposed modularity-driven ensemble-based approach, which has shown to outperform state-of-the-art multilayer methods in terms of modularity, silhouette of community memberships, and redundancy assessment criteria, and also in terms of execution times.

Journal Article

Share this book

Add to My Shelf

Explainable AI-Based Ensemble Clustering for Load Profiling and Demand Response

by Sarmas, Elissaios , Fragkiadaki, Afroditi , Marinakis, Vangelis in Algorithms , Clustering , Consumer behavior

2024

Smart meter data provide an in-depth perspective on household energy usage. This research leverages on such data to enhance demand response (DR) programs through a novel application of ensemble clustering. Despite its promising capabilities, our literature review identified a notable under-utilization of ensemble clustering in this domain. To address this shortcoming, we applied an advanced ensemble clustering method and compared its performance with traditional algorithms, namely, K-Means++, fuzzy K-Means, Hierarchical Agglomerative Clustering, Spectral Clustering, Gaussian Mixture Models (GMMs), BIRCH, and Self-Organizing Maps (SOMs), across a dataset of 5567 households for a range of cluster counts from three to nine. The performance of these algorithms was assessed using an extensive set of evaluation metrics, including the Silhouette Score, the Davies–Bouldin Score, the Calinski–Harabasz Score, and the Dunn Index. Notably, while ensemble clustering often ranked among the top performers, it did not consistently surpass all individual algorithms, indicating its potential for further optimization. Unlike approaches that seek the algorithmically optimal number of clusters, our method proposes a practical six-cluster solution designed to meet the operational needs of utility providers. For this case, the best performing algorithm according to the evaluation metrics was ensemble clustering. This study is further enhanced by integrating Explainable AI (xAI) techniques, which improve the interpretability and transparency of our clustering results.

Journal Article

Share this book

Add to My Shelf

A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters

by Mohammadpoor, Majid , Samad Nejatian , Mojarad, Musa in Agglomeration , Algorithms , Cluster analysis

2019

For obtaining the more robust, novel, stable, and consistent clustering result, clustering ensemble has been emerged. There are two approaches in clustering ensemble frameworks: (a) the approaches that focus on creation or preparation of a suitable ensemble, called as ensemble creation approaches, and (b) the approaches that try to find a suitable final clustering (called also as consensus clustering) out of a given ensemble, called as ensemble aggregation approaches. The first approaches try to solve ensemble creation problem. The second approaches try to solve aggregation problem. This paper tries to propose an ensemble aggregator, or a consensus function, called as Robust Clustering Ensemble based on Sampling and Cluster Clustering (RCESCC).RCESCC algorithm first generates an ensemble of fuzzy clusterings generated by the fuzzy c-means algorithm on subsampled data. Then, it obtains a cluster-cluster similarity matrix out of the fuzzy clusters. After that, it partitions the fuzzy clusters by applying a hierarchical clustering algorithm on the cluster-cluster similarity matrix. In the next phase, the RCESCC algorithm assigns the data points to merged clusters. The experimental results comparing with the state of the art clustering algorithms indicate the effectiveness of the RCESCC algorithm in terms of performance, speed and robustness.

Journal Article

Share this book

Add to My Shelf

GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection

by Tsoucas, Daphne , Yuan, Guo-Cheng in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2018

Single-cell analysis is a powerful tool for dissecting the cellular composition within a tissue or organ. However, it remains difficult to detect rare and common cell types at the same time. Here, we present a new computational method, GiniClust2, to overcome this challenge. GiniClust2 combines the strengths of two complementary approaches, using the Gini index and Fano factor, respectively, through a cluster-aware, weighted ensemble clustering technique. GiniClust2 successfully identifies both common and rare cell types in diverse datasets, outperforming existing methods. GiniClust2 is scalable to large datasets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter