Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
44,696
result(s) for
"clustering algorithm"
Sort by:
Data clustering: application and trends
2023
Clustering has primarily been used as an analytical technique to group unlabeled data for extracting meaningful information. The fact that no clustering algorithm can solve all clustering problems has resulted in the development of several clustering algorithms with diverse applications. We review data clustering, intending to underscore recent applications in selected industrial sectors and other notable concepts. In this paper, we begin by highlighting clustering components and discussing classification terminologies. Furthermore, specific, and general applications of clustering are discussed. Notable concepts on clustering algorithms, emerging variants, measures of similarities/dissimilarities, issues surrounding clustering optimization, validation and data types are outlined. Suggestions are made to emphasize the continued interest in clustering techniques both by scholars and Industry practitioners. Key findings in this review show the size of data as a classification criterion and as data sizes for clustering become larger and varied, the determination of the optimal number of clusters will require new feature extracting methods, validation indices and clustering techniques. In addition, clustering techniques have found growing use in key industry sectors linked to the sustainable development goals such as manufacturing, transportation and logistics, energy, and healthcare, where the use of clustering is more integrated with other analytical techniques than a stand-alone clustering technique.
Journal Article
A Taxonomy of Machine Learning Clustering Algorithms, Challenges, and Future Realms
by
Sharif, Zubair
,
Anwar, Toni
,
Pitafi, Shahneela
in
Algorithms
,
Artificial intelligence
,
Big Data
2023
In the field of data mining, clustering has shown to be an important technique. Numerous clustering methods have been devised and put into practice, and most of them locate high-quality or optimum clustering outcomes in the field of computer science, data science, statistics, pattern recognition, artificial intelligence, and machine learning. This research provides a modern, thorough review of both classic and cutting-edge clustering methods. The taxonomy of clustering is presented in this review from an applied angle and the compression of some hierarchical and partitional clustering algorithms with various parameters. We also discuss the open challenges in clustering such as computational complexity, refinement of clusters, speed of convergence, data dimensionality, effectiveness and scalability, data object representation, evaluation measures, data streams, and knowledge extraction; scientists and professionals alike will be able to use it as a benchmark as they strive to advance the state-of-the-art in clustering techniques.
Journal Article
A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm
2023
Spectral clustering (SC) has become a popular choice for data clustering by converting a dataset to a graph structure and then by identifying optimal subgraphs by graph partitioning to complete the clustering. However, k-means is taken at the clustering stage to randomly select the initial cluster centers, which leads to unstable performance. Notably, k-means needs to specify the number of clusters (prior knowledge). Second, SC calculates the similarity matrix using the linear Euclidean distance, losing part of the effective information. Third, real datasets usually contain redundant features, but traditional SC does not adequately address multi-attribute data. To solve these issues, we propose an SC algorithm based on the attribute fluctuation and density peaks clustering algorithm (AFDSC) to improve the clustering accuracy and effect. Furthermore, to verify the idea of the AFDSC algorithm, we extract the attribute fluctuation factor and propose a histogram clustering algorithm based on attribute fluctuation (AFHC) divorced from spectral clustering. Experimental results show that both the AFDSC algorithm and AFHC algorithm have achieved better performance on fifteen UCI datasets compared with other clustering algorithms.
Journal Article
Research on incremental clustering algorithm for big data
2023
As the scale of data becomes larger and larger, clustering processing, a key step in data mining, has important practical significance. Aiming at the problems of time consumption and high clustering errors when the current clustering algorithms deal with massive and dynamic big data, an incremental clustering algorithm is proposed by taking big data as the research object. By exploring the attribute characteristics of big data, four characteristics such as scale, diversity, high speed and value are summarised. For large-scale data streams that have multiple attributes and are acquired one by one, optimise the setting method of the K-means clustering algorithm category centre point, combine the K-means clustering algorithm and the Kalman filter algorithm and measure the distance between data point pairs. Instead of Mahalanobis distance, an incremental clustering algorithm suitable for big data is constructed. Five data sets are selected to carry out example analysis. The results of the algorithm are verified by the algorithm. The proposed algorithm has obvious advantages in the incremental clustering effect of big data. At the same time, it also has efficient and stable computing performance, which meets the expected design requirements and goals.
Journal Article
A Method for Detecting Overlapping Protein Complexes Based on an Adaptive Improved FCM Clustering Algorithm
by
Jiang, Kaiying
,
Wang, Caixia
,
Wang, Rongquan
in
Adaptive algorithms
,
Algorithms
,
Bioinformatics
2025
A protein complex can be regarded as a functional module developed by interacting proteins. The protein complex has attracted significant attention in bioinformatics as a critical substance in life activities. Identifying protein complexes in protein–protein interaction (PPI) networks is vital in life sciences and biological activities. Therefore, significant efforts have been made recently in biological experimental methods and computing methods to detect protein complexes accurately. This study proposed a new method for PPI networks to facilitate the processing and development of the following algorithms. Then, a combination of the improved density peaks clustering algorithm (DPC) and the fuzzy C-means clustering algorithm (FCM) was proposed to overcome the shortcomings of the traditional FCM algorithm. In other words, the rationality of results obtained using the FCM algorithm is closely related to the selection of cluster centers. The objective function of the FCM algorithm was redesigned based on ‘high cohesion’ and ‘low coupling’. An adaptive parameter-adjusting algorithm was designed to optimize the parameters of the proposed detection algorithm. This algorithm is denoted as the DFPO algorithm (DPC-FCM Parameter Optimization). Finally, the performance of the DFPO algorithm was evaluated using multiple metrics and compared with over ten state-of-the-art protein complex detection algorithms. Experimental results indicate that the proposed DFPO algorithm exhibits improved detection accuracy compared with other algorithms.
Journal Article
Gene regulatory networks for lignin biosynthesis in switchgrass (Panicum virgatum)
2019
Summary Cell wall recalcitrance is the major challenge to improving saccharification efficiency in converting lignocellulose into biofuels. However, information regarding the transcriptional regulation of secondary cell wall biogenesis remains poor in switchgrass (Panicum virgatum), which has been selected as a biofuel crop in the United States. In this study, we present a combination of computational and experimental approaches to develop gene regulatory networks for lignin formation in switchgrass. To screen transcription factors (TFs) involved in lignin biosynthesis, we developed a modified method to perform co‐expression network analysis using 14 lignin biosynthesis genes as bait (target) genes. The switchgrass lignin co‐expression network was further extended by adding 14 TFs identified in this study, and seven TFs identified in previous studies, as bait genes. Six TFs (PvMYB58/63, PvMYB42/85, PvMYB4, PvWRKY12, PvSND2 and PvSWN2) were targeted to generate overexpressing and/or down‐regulated transgenic switchgrass lines. The alteration of lignin content, cell wall composition and/or plant growth in the transgenic plants supported the role of the TFs in controlling secondary wall formation. RNA‐seq analysis of four of the transgenic switchgrass lines revealed downstream target genes of the secondary wall‐related TFs and crosstalk with other biological pathways. In vitro transactivation assays further confirmed the regulation of specific lignin pathway genes by four of the TFs. Our meta‐analysis provides a hierarchical network of TFs and their potential target genes for future manipulation of secondary cell wall formation for lignin modification in switchgrass.
Journal Article
Detecting Very Weak Signals: A Mixed Strategy to Deal with Biologically Relevant Information
by
Giuliani, Alessandro
,
Zeuner, Ann
,
Vici, Alessandro
in
Algorithms
,
Cluster analysis
,
Clustering
2025
In many biological investigations, the relevant information does not coincide with the most powerful signals (most elevated eigenvalues, dominant frequencies, most populated clusters...), but very often hides in minor features that are difficult to discriminate from random noise. Here we propose an algorithm that, by the combined use of a non-linear cluster analysis procedure and a strategy to discriminate minor signal components from noise, allows singling out biologically relevant hidden information. We tested the algorithm on a sparse data set corresponding to single-cell RNA-Seq measures, being able to identify a very small population of cells in charge of the immune response toward cancer tissue.
Journal Article
Research on Digital Design of Modern Sculpture in New Media Era
2024
This paper is dedicated to exploring the digital design of modern sculpture in the era of new media, addressing the challenges and opportunities encountered by traditional sculpture in digital transformation. Considering the innovation brought by digital technology to artistic creation, the precision and efficiency of sculpture digitization are improved by introducing advanced NURBS method and FCM clustering algorithm through precise analysis and characterization of surface geometric parameters of three-dimensional sculptures. The surface geometric parameters and characteristics of three-dimensional sculptures are analyzed using non-uniform rational B-spline (NURBS) and fuzzy C-mean (FCM) clustering algorithms. The high-order surfaces of the sculptures can be represented effectively by the NURBS method, whereas the FCM clustering algorithms exhibit highly efficient performance in surface partitioning planning. The NURBS-based FCM algorithm can reduce the root-mean-square error of point cloud splicing to 0.0853 mm, reduce the number of iterations to 3, and shorten the algorithm’s running time to 18.46 seconds. The practice of digital sculpture application shows that the method improves work efficiency and reduces production costs. The digital design method proposed in this study provides a new way of producing and creating modern sculpture, which helps develop and preserve traditional sculpture art in the new media era.
Journal Article
Two-phase clustering algorithm with density exploring distance measure
by
Jiang, Xiangming
,
Ma, Jingjing
,
Gong, Maoguo
in
Algorithms
,
C1140Z Other topics in statistics
,
C1160 Combinatorial mathematics
2018
Here, the authors propose a novel two-phase clustering algorithm with a density exploring distance (DED) measure. In the first phase, the fast global K-means clustering algorithm is used to obtain the cluster number and the prototypes. Then, the prototypes of all these clusters and representatives of points belonging to these clusters are regarded as the input data set of the second phase. Afterwards, all the prototypes are clustered according to a DED measure which makes data points locating in the same structure to possess high similarity with each other. In experimental studies, the authors test the proposed algorithm on seven artificial as well as seven UCI data sets. The results demonstrate that the proposed algorithm is flexible to different data distributions and has a stronger ability in clustering data sets with complex non-convex distribution when compared with the comparison algorithms.
Journal Article
Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping
2025
As a highly heterogeneous and complex disease, the identification of cancer’s molecular subtypes is crucial for accurate diagnosis and personalized treatment. The integration of multi-omics data enables a comprehensive interpretation of the molecular characteristics of cancer at various biological levels. In recent years, an increasing number of multi-omics clustering algorithms for cancer molecular subtyping have been proposed. However, the absence of a definitive gold standard makes it challenging to evaluate and compare these methods effectively. In this study, we developed a general framework for the comprehensive evaluation of multi-omics clustering algorithms and introduced an innovative metric, the accuracy-weighted average index, which simultaneously considers both clustering performance and clinical relevance. Using this framework, we performed a thorough evaluation and comparison of 11 state-of-the-art multi-omics clustering algorithms, including deep learning-based methods. By integrating the accuracy-weighted average index with computational efficiency, our analysis reveals that PIntMF demonstrates the best overall performance, making it a promising tool for molecular subtyping across a wide range of cancers.
Journal Article