Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
60
result(s) for
"Complete clustering"
Sort by:
Clustering data streams using grid-based synopsis
by
Bhatnagar, Vasudha
,
Kaur, Sharanjit
,
Chakravarthy, Sharma
in
Adaptive algorithms
,
Algorithms
,
Applied sciences
2014
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple scans of data an impractical proposition for the purpose of knowledge discovery. Need to learn structures from data in streaming environment has been a driving force for making clustering a popular technique for knowledge discovery from data streams. Continuous nature of streaming data makes it infeasible to look for point membership among the clusters discovered so far, necessitating employment of a synopsis structure to consolidate incoming data points. This synopsis is exploited for building clustering scheme to meet subsequent user demands. The proposed
Exclusive and Complete Clustering
(ExCC) algorithm captures non-overlapping clusters in data streams with mixed attributes, such that each point either belongs to some cluster or is an outlier/noise. The algorithm is robust, adaptive to changes in data distribution and detects succinct outliers on-the-fly. It deploys a fixed granularity grid structure as synopsis and performs clustering by coalescing dense regions in grid.
Speed
-
based
pruning is applied to synopsis prior to clustering to ensure currency of discovered clusters. Extensive experimentation demonstrates that the algorithm is robust, identifies succinct outliers on-the-fly and is adaptive to change in the data distribution. ExCC algorithm is further evaluated for performance and compared with other contemporary algorithms.
Journal Article
Clustering in small area estimation with area level linear mixed models
2017
Finding reliable estimates of parameters of subpopulations (areas) in small area estimation is an important problem especially when there are few or no samples in some areas. Clustering small areas on the basis of the Euclidean distance between their corresponding covariates is proposed to obtain smaller mean-squared prediction error (MSPE) for the predicted values of area means by using area level linear mixed models. We first propose a statistical test to investigate the homogeneity of variance components between clusters. Then, we obtain the empirical best linear unbiased predictor of small area means by taking into account the difference between variance components in different clusters. We study the performance of our proposed test as well as the effect of the clustering on the MSPE of small area means by using simulation studies. We also obtain a second-order approximation to the MSPE of small area means and derive a second-order unbiased estimator of the MSPE. The results show that the MSPE of small area means can be improved when the variance components are different. The improvement in the MSPE is significant when the difference between variance components is considerable. Finally, the methodology proposed is applied to a real data set.
Journal Article
Application of Two-Step Entropy–TOPSIS Method and Complete Linkage Clustering for Water-Pumping Windmill Investment on Thailand Peninsula
by
Klongboonjit, Sakon
,
Kiatcharoenpol, Tossapol
in
Agricultural production
,
Agriculture
,
Cost control
2024
This study focuses on identifying suitable areas for the installation of water-pumping windmills in Thailand, which require wind speeds of at least 4 m/s to operate efficiently. A simple combined approach is introduced, integrating the Entropy–TOPSIS method complete linkage clustering to prioritize and categorize potential locations. Out of 271 initial areas, 28 have been selected based on their ability to meet the 4 m/s wind speed threshold. The Entropy–TOPSIS method first evaluates these areas based on monthly wind speed and agricultural area. The analysis reveals that regions with higher wind speeds generally score better for wind energy potential, while areas with larger agricultural spaces tend to score higher for farming suitability. The final integrated scores show that agricultural area is more significant, with a weight of 0.7788, compared to the wind speed weight of 0.2212. The areas are then ranked, and complete linkage clustering groups them into six categories, from the most to the least suitable for windmill installation. A sensitivity analysis confirms the robustness of the clustering method, as the group composition remains stable despite minor changes in weight adjustments. This approach simplifies decision-making for sustainable energy investments in Thailand agriculture sector.
Journal Article
Parameterization in the Analysis of Changes in the Rural Landscape on the Example of Agritourism Farms in Kłodzko District (Poland)
by
Bocheńska-Skałecka, Anna
,
Ostrowska-Dudys, Maria
,
Jakubowski, Wojciech
in
Agricultural production
,
Agriculture
,
Culture
2022
The European Landscape Convention (2006) indicates that landscape conservation is as important as the protection of the overall environment. Although the boundaries between urban and rural areas in many countries are blurring, the rural landscape is still perceived as a valuable landscape artefact. Traditional rural landscapes have undergone significant transformations over the past few decades. The authors attempt to analyze factors causing apparent changes in the rural landscape, based on the example of agritourism farms in Kłodzko District, Lower Silesia. The changes taking place in Poland after 1989 resulted in reduced profitability of agricultural production. This was why small farms stopped using land for agricultural production. Agritourism has become one of the forms of business activity. Therefore, it became necessary to adapt farms to a new function. The 37 agritourism farms registered in rural and rural-urban municipalities of Kłodzko District have been randomly selected for the survey. The research has shown the extent of changes related to the transformation of agricultural farms into agritourism ones. Six areas (categories) where changes took place have been identified based on the analysis of collected data. The authors have included the collected data in the parameterization of surveyed agritourism farms, taking into account: the condition of the agricultural farm before introducing its new role (0) and the present condition, with an agritourism function (1). The complete linkage clustering (the maximum distance) known as cluster analysis was used to examine the variables in terms of farm change. The aim was to select outstanding units from the research sample for further research as case studies.
Journal Article
Classification
by
Wildi, Otto
in
aim of classification, internally homogeneous and distinct from other groups
,
average‐linkage clustering ‐ UPGMA, WPGMA, UPGMC and WPGMC
,
BIOLOGY, LIFE SCIENCES
2011,2010
This chapter contains sections titled:
Group structures
Linkage clustering
Minimum‐variance clustering
Average‐linkage clustering: UPGMA, WPGMA, UPGMC and WPGMC
Forming groups
Structured synoptic tables
Book Chapter
Prediction of dam deformation using adaptive noise CEEMDAN and BiGRU time series modeling
by
WANG Zixuan
,
OU Bin
,
FU Shuyan
in
dam deformation; complete ensemble empirical mode decomposition of adaptive noise; sample entropy reconstruction; k-means clustering algorithm; symbiotic search algorithm; variational mode decomposition
2025
【Background and Objective】Accurate prediction of dam deformation is crucial for ensuring the safety of dam structures in engineering monitoring. Dam deformation is influenced by multiple factors, including water pressure, temperature, and material aging, which often exhibit nonlinear and dynamic relationships. During monitoring, system noise and observation errors frequently interfere with data quality, posing additional challenges for analysis. To address the challenges posed by system noise and strong nonlinear effects in dam deformation, this paper proposes a dam deformation monitoring model based on multi-layer integrated signal processing technology.【Method】The model uses sample entropy reconstruction and the K-means clustering algorithm to optimize the adaptive noise complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) process, generating multiple intrinsic mode functions (IMF). High-frequency modal components undergo secondary decomposition using variational mode decomposition (VMD) to extract the optimal intrinsic mode function. Finally, an improved symbiotic biological search algorithm combined with a Bidirectional Gated Recurrent Unit (BiGRU) is used to accurately predict dam deformation.【Result】Case analysis demonstrates that, compared to traditional prediction models, the proposed model achieves a root mean square error (RMSE) of 0.031 9 mm, mean absolute error (MAE) of 0.015 3 mm, mean absolute percentage error (MAPE) of 2.51%, and determination coefficient (R2) of 0.971 2.【Conclusion】 The results verify that the proposed model captures and simulates the dam deformation process more accurately, exhibiting higher prediction accuracy and stronger generalization ability.
Journal Article
Hierarchical and k-Means Clustering
by
Larose, Daniel T
,
Larose, Chantal D
in
complete‐linkage clustering
,
hierarchical clustering methods
,
k‐means clustering
2014
Clustering algorithms seek to segment the entire data set into relatively homogeneous subgroups or clusters. Clustering is often performed as a preliminary step in a data mining process. This chapter discusses about the hierarchical clustering methods and describes k‐means clustering algorithm. In hierarchical clustering, a treelike cluster structure is created through recursive partitioning (divisive methods) or combining (agglomerative) of existing clusters. Single‐linkage clustering seeks the minimum distance between any records in two clusters. Complete‐linkage clustering seeks to minimize the distance among the records in two clusters that are farthest from each other. The k‐means clustering algorithm is a straightforward and effective algorithm for finding clusters in data. The Enterprise Miner clustering node uses SAS's FASTCLUS procedure, a version of the k‐means algorithm.
Book Chapter
Upper and lower bounds for complete linkage in general metric spaces
by
Schmidt, Melanie
,
Wargalla, Julian
,
Großwendt, Anna
in
Algorithms
,
Approximation
,
Artificial Intelligence
2024
In a hierarchical clustering problem the task is to compute a series of mutually compatible clusterings of a finite metric space
(
P
,
dist
)
. Starting with the clustering where every point forms its own cluster, one iteratively merges two clusters until only one cluster remains. Complete linkage is a well-known and popular algorithm to compute such clusterings: in every step it merges the two clusters whose union has the smallest radius (or diameter) among all currently possible merges. We prove that the radius (or diameter) of every
k
-clustering computed by complete linkage is at most by factor
O
(
k
) (or
O
(
k
ln
(
3
)
/
ln
(
2
)
)
=
O
(
k
1.59
)
) worse than an optimal
k
-clustering minimizing the radius (or diameter). Furthermore we give a negative answer to the question proposed by Dasgupta and Long (J Comput Syst Sci 70(4):555–569, 2005.
https://doi.org/10.1016/j.jcss.2004.10.006
), who show a lower bound of
Ω
(
log
(
k
)
)
and ask if the approximation guarantee is in fact
Θ
(
log
(
k
)
)
. We present instances where complete linkage performs poorly in the sense that the
k
-clustering computed by complete linkage is off by a factor of
Ω
(
k
)
from an optimal solution for radius and diameter. We conclude that in general metric spaces complete linkage does not perform asymptotically better than single linkage, merging the two clusters with smallest inter-cluster distance, for which we prove an approximation guarantee of
O
(
k
).
Journal Article
Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic Encryption
2024
Hierarchical clustering is a widely used data analysis technique. Typically, tools for this method operate on data in its original, readable form, raising privacy concerns when a clustering task involving sensitive data that must remain confidential is outsourced to an external server. To address this issue, we developed a method that integrates Cheon-Kim-Kim-Song homomorphic encryption (HE), allowing the clustering process to be performed without revealing the raw data. In hierarchical clustering, the two nearest clusters are repeatedly merged until the desired number of clusters is reached. The proximity of clusters is evaluated using various metrics. In this study, we considered two well-known metrics: single linkage and complete linkage. Applying HE to these methods involves sorting encrypted distances, which is a resource-intensive operation. Therefore, we propose a cooperative approach in which the data owner aids the sorting process and shares a list of data positions with a computation server. Using this list, the server can determine the clustering of the data points. The proposed approach ensures secure hierarchical clustering using single and complete linkage methods without exposing the original data.
Journal Article
An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling
2023
Imbalanced data classification remains a research hotspot and a challenging problem in the field of machine learning. The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS) to address both class imbalance and class overlapping problems. The DCSHS has three main parts: projection clustering combination framework (PCC), stage-wise hybrid sampling (SHS) and envelope clustering transfer mapping mechanism (CTM). PCC is to create multiple subsets through projective clustering. SHS is to identify the overlapping region of each subset and conduct hybrid sampling. CTM is to explore more information of samples in each subset by combining the clustering and transfer learning. At first, we design a PCC framework guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with low overlapping. Secondly, according to the characteristics of subset classes, a SHS algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, an envelope clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structural information of samples. Weak classifiers are trained on the balanced subsets, and fused as all the imbalanced ensemble algorithms did. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of anti-overlapping, Recall, F1-M, G-M, AUC, and diversity.
Journal Article