Catalogue Search | MBRL

in Algorithms , Application , Classification

2023

Clustering has primarily been used as an analytical technique to group unlabeled data for extracting meaningful information. The fact that no clustering algorithm can solve all clustering problems has resulted in the development of several clustering algorithms with diverse applications. We review data clustering, intending to underscore recent applications in selected industrial sectors and other notable concepts. In this paper, we begin by highlighting clustering components and discussing classification terminologies. Furthermore, specific, and general applications of clustering are discussed. Notable concepts on clustering algorithms, emerging variants, measures of similarities/dissimilarities, issues surrounding clustering optimization, validation and data types are outlined. Suggestions are made to emphasize the continued interest in clustering techniques both by scholars and Industry practitioners. Key findings in this review show the size of data as a classification criterion and as data sizes for clustering become larger and varied, the determination of the optimal number of clusters will require new feature extracting methods, validation indices and clustering techniques. In addition, clustering techniques have found growing use in key industry sectors linked to the sustainable development goals such as manufacturing, transportation and logistics, energy, and healthcare, where the use of clustering is more integrated with other analytical techniques than a stand-alone clustering technique.

Journal Article

Share this book

Add to My Shelf

A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms

by Sharif, Zubair , Anwar, Toni , Pitafi, Shahneela in Algorithms , Artificial intelligence , Big Data

2023

In the field of data mining, clustering has shown to be an important technique. Numerous clustering methods have been devised and put into practice, and most of them locate high-quality or optimum clustering outcomes in the field of computer science, data science, statistics, pattern recognition, artificial intelligence, and machine learning. This research provides a modern, thorough review of both classic and cutting-edge clustering methods. The taxonomy of clustering is presented in this review from an applied angle and the compression of some hierarchical and partitional clustering algorithms with various parameters. We also discuss the open challenges in clustering such as computational complexity, refinement of clusters, speed of convergence, data dimensionality, effectiveness and scalability, data object representation, evaluation measures, data streams, and knowledge extraction; scientists and professionals alike will be able to use it as a benchmark as they strive to advance the state-of-the-art in clustering techniques.

Journal Article

Share this book

Add to My Shelf

A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm

by Zhu, Jianlin , Li, Shuhua , Song, Xin in Algorithms , Artificial Intelligence , Clustering

2023

Spectral clustering (SC) has become a popular choice for data clustering by converting a dataset to a graph structure and then by identifying optimal subgraphs by graph partitioning to complete the clustering. However, k-means is taken at the clustering stage to randomly select the initial cluster centers, which leads to unstable performance. Notably, k-means needs to specify the number of clusters (prior knowledge). Second, SC calculates the similarity matrix using the linear Euclidean distance, losing part of the effective information. Third, real datasets usually contain redundant features, but traditional SC does not adequately address multi-attribute data. To solve these issues, we propose an SC algorithm based on the attribute fluctuation and density peaks clustering algorithm (AFDSC) to improve the clustering accuracy and effect. Furthermore, to verify the idea of the AFDSC algorithm, we extract the attribute fluctuation factor and propose a histogram clustering algorithm based on attribute fluctuation (AFHC) divorced from spectral clustering. Experimental results show that both the AFDSC algorithm and AFHC algorithm have achieved better performance on fifteen UCI datasets compared with other clustering algorithms.

Journal Article

Share this book

Add to My Shelf

Research on incremental clustering algorithm for big data

by Yang, Xiaoqing in Algorithms , Big Data , Clustering

2023

As the scale of data becomes larger and larger, clustering processing, a key step in data mining, has important practical significance. Aiming at the problems of time consumption and high clustering errors when the current clustering algorithms deal with massive and dynamic big data, an incremental clustering algorithm is proposed by taking big data as the research object. By exploring the attribute characteristics of big data, four characteristics such as scale, diversity, high speed and value are summarised. For large-scale data streams that have multiple attributes and are acquired one by one, optimise the setting method of the K-means clustering algorithm category centre point, combine the K-means clustering algorithm and the Kalman filter algorithm and measure the distance between data point pairs. Instead of Mahalanobis distance, an incremental clustering algorithm suitable for big data is constructed. Five data sets are selected to carry out example analysis. The results of the algorithm are verified by the algorithm. The proposed algorithm has obvious advantages in the incremental clustering effect of big data. At the same time, it also has efficient and stable computing performance, which meets the expected design requirements and goals.

Journal Article

Share this book

Add to My Shelf

A Method for Detecting Overlapping Protein Complexes Based on an Adaptive Improved FCM Clustering Algorithm

by Jiang, Kaiying , Wang, Caixia , Wang, Rongquan in Adaptive algorithms , Algorithms , Bioinformatics

2025

A protein complex can be regarded as a functional module developed by interacting proteins. The protein complex has attracted significant attention in bioinformatics as a critical substance in life activities. Identifying protein complexes in protein–protein interaction (PPI) networks is vital in life sciences and biological activities. Therefore, significant efforts have been made recently in biological experimental methods and computing methods to detect protein complexes accurately. This study proposed a new method for PPI networks to facilitate the processing and development of the following algorithms. Then, a combination of the improved density peaks clustering algorithm (DPC) and the fuzzy C-means clustering algorithm (FCM) was proposed to overcome the shortcomings of the traditional FCM algorithm. In other words, the rationality of results obtained using the FCM algorithm is closely related to the selection of cluster centers. The objective function of the FCM algorithm was redesigned based on ‘high cohesion’ and ‘low coupling’. An adaptive parameter-adjusting algorithm was designed to optimize the parameters of the proposed detection algorithm. This algorithm is denoted as the DFPO algorithm (DPC-FCM Parameter Optimization). Finally, the performance of the DFPO algorithm was evaluated using multiple metrics and compared with over ten state-of-the-art protein complex detection algorithms. Experimental results indicate that the proposed DFPO algorithm exhibits improved detection accuracy compared with other algorithms.

Journal Article

Share this book

Add to My Shelf

Gene regulatory networks for lignin biosynthesis in switchgrass (Panicum virgatum)

by Li, Guifen , Frazier, Taylor P. , Lenaghan, Scott in 09 BIOMASS FUELS , Analysis , Baits

2019

Summary Cell wall recalcitrance is the major challenge to improving saccharification efficiency in converting lignocellulose into biofuels. However, information regarding the transcriptional regulation of secondary cell wall biogenesis remains poor in switchgrass (Panicum virgatum), which has been selected as a biofuel crop in the United States. In this study, we present a combination of computational and experimental approaches to develop gene regulatory networks for lignin formation in switchgrass. To screen transcription factors (TFs) involved in lignin biosynthesis, we developed a modified method to perform co‐expression network analysis using 14 lignin biosynthesis genes as bait (target) genes. The switchgrass lignin co‐expression network was further extended by adding 14 TFs identified in this study, and seven TFs identified in previous studies, as bait genes. Six TFs (PvMYB58/63, PvMYB42/85, PvMYB4, PvWRKY12, PvSND2 and PvSWN2) were targeted to generate overexpressing and/or down‐regulated transgenic switchgrass lines. The alteration of lignin content, cell wall composition and/or plant growth in the transgenic plants supported the role of the TFs in controlling secondary wall formation. RNA‐seq analysis of four of the transgenic switchgrass lines revealed downstream target genes of the secondary wall‐related TFs and crosstalk with other biological pathways. In vitro transactivation assays further confirmed the regulation of specific lignin pathway genes by four of the TFs. Our meta‐analysis provides a hierarchical network of TFs and their potential target genes for future manipulation of secondary cell wall formation for lignin modification in switchgrass.

Journal Article

Share this book

Add to My Shelf

Detecting Very Weak Signals: A Mixed Strategy to Deal with Biologically Relevant Information

by Giuliani, Alessandro , Zeuner, Ann , Vici, Alessandro in Algorithms , Cluster analysis , Clustering

2025

In many biological investigations, the relevant information does not coincide with the most powerful signals (most elevated eigenvalues, dominant frequencies, most populated clusters...), but very often hides in minor features that are difficult to discriminate from random noise. Here we propose an algorithm that, by the combined use of a non-linear cluster analysis procedure and a strategy to discriminate minor signal components from noise, allows singling out biologically relevant hidden information. We tested the algorithm on a sparse data set corresponding to single-cell RNA-Seq measures, being able to identify a very small population of cells in charge of the immune response toward cancer tissue.

Journal Article

Share this book

Add to My Shelf

Research on Digital Design of Modern Sculpture in New Media Era

by Zhang, Chaoyang , Chen, Xiaozhong in 65Y04 , Algorithms , Clustering

2024

This paper is dedicated to exploring the digital design of modern sculpture in the era of new media, addressing the challenges and opportunities encountered by traditional sculpture in digital transformation. Considering the innovation brought by digital technology to artistic creation, the precision and efficiency of sculpture digitization are improved by introducing advanced NURBS method and FCM clustering algorithm through precise analysis and characterization of surface geometric parameters of three-dimensional sculptures. The surface geometric parameters and characteristics of three-dimensional sculptures are analyzed using non-uniform rational B-spline (NURBS) and fuzzy C-mean (FCM) clustering algorithms. The high-order surfaces of the sculptures can be represented effectively by the NURBS method, whereas the FCM clustering algorithms exhibit highly efficient performance in surface partitioning planning. The NURBS-based FCM algorithm can reduce the root-mean-square error of point cloud splicing to 0.0853 mm, reduce the number of iterations to 3, and shorten the algorithm’s running time to 18.46 seconds. The practice of digital sculpture application shows that the method improves work efficiency and reduces production costs. The digital design method proposed in this study provides a new way of producing and creating modern sculpture, which helps develop and preserve traditional sculpture art in the new media era.

Journal Article

Share this book

Add to My Shelf

Two-phase clustering algorithm with density exploring distance measure

by Jiang, Xiangming , Ma, Jingjing , Gong, Maoguo in Algorithms , C1140Z Other topics in statistics , C1160 Combinatorial mathematics

2018

Here, the authors propose a novel two-phase clustering algorithm with a density exploring distance (DED) measure. In the first phase, the fast global K-means clustering algorithm is used to obtain the cluster number and the prototypes. Then, the prototypes of all these clusters and representatives of points belonging to these clusters are regarded as the input data set of the second phase. Afterwards, all the prototypes are clustered according to a DED measure which makes data points locating in the same structure to possess high similarity with each other. In experimental studies, the authors test the proposed algorithm on seven artificial as well as seven UCI data sets. The results demonstrate that the proposed algorithm is flexible to different data distributions and has a stronger ability in clustering data sets with complex non-convex distribution when compared with the comparison algorithms.

Journal Article

Share this book

Add to My Shelf

Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping

by Zhu, Yunping , Wang, Lingxiao , Liu, Yi in Accuracy , Algorithms , Analysis

2025

As a highly heterogeneous and complex disease, the identification of cancer’s molecular subtypes is crucial for accurate diagnosis and personalized treatment. The integration of multi-omics data enables a comprehensive interpretation of the molecular characteristics of cancer at various biological levels. In recent years, an increasing number of multi-omics clustering algorithms for cancer molecular subtyping have been proposed. However, the absence of a definitive gold standard makes it challenging to evaluate and compare these methods effectively. In this study, we developed a general framework for the comprehensive evaluation of multi-omics clustering algorithms and introduced an innovative metric, the accuracy-weighted average index, which simultaneously considers both clustering performance and clinical relevance. Using this framework, we performed a thorough evaluation and comparison of 11 state-of-the-art multi-omics clustering algorithms, including deep learning-based methods. By integrating the accuracy-weighted average index with computational efficiency, our analysis reveals that PIntMF demonstrates the best overall performance, making it a promising tool for molecular subtyping across a wide range of cancers.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter