Catalogue Search | MBRL

How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification

by Liao, Tim F. , Hennig, Christian in Analysis , Application , Appropriateness

2013

Data with mixed-type (metric–ordinal–nominal) variables are typical for social stratification, i.e. partitioning a population into social classes. Approaches to cluster such data are compared, namely a latent class mixture model assuming local independence and dissimilarity-based methods such as k-medoids. The design of an appropriate dissimilarity measure and the estimation of the number of clusters are discussed as well, comparing the Bayesian information criterion with dissimilarity-based criteria. The comparison is based on a philosophy of cluster analysis that connects the problem of a choice of a suitable clustering method closely to the application by considering direct interpretations of the implications of the methodology. The application of this philosophy to economic data from the 2007 US Survey of Consumer Finances demonstrates techniques and decisions required to obtain an interpretable clustering. The clustering is shown to be significantly more structured than a suitable null model. One result is that the data-based strata are not as strongly connected to occupation categories as is often assumed in the literature.

Journal Article

Share this book

Add to My Shelf

K‐medoids clustering of hospital admission characteristics to classify severity of influenza virus infection

by Leis, Aleda M. , Ferdinands, Jill , Patel, Manish in Acuity , Age groups , Algorithms

2023

Background Patients are admitted to the hospital for respiratory illness at different stages of their disease course. It is important to appropriately analyse this heterogeneity in surveillance data to accurately measure disease severity among those hospitalized. The purpose of this study was to determine if unique baseline clusters of influenza patients exist and to examine the association between cluster membership and in‐hospital outcomes. Methods Patients hospitalized with influenza at two hospitals in Southeast Michigan during the 2017/2018 (n = 242) and 2018/2019 (n = 115) influenza seasons were included. Physiologic and laboratory variables were collected for the first 24 h of the hospital stay. K‐medoids clustering was used to determine groups of individuals based on these values. Multivariable linear regression or Firth's logistic regression were used to examine the association between cluster membership and clinical outcomes. Results Three clusters were selected for 2017/2018, mainly differentiated by blood glucose level. After adjustment, those in C171 had 5.6 times the odds of mechanical ventilator use than those in C172 (95% CI: 1.49, 21.1) and a significantly longer mean hospital length of stay than those in both C172 (mean 1.5 days longer, 95% CI: 0.2, 2.7) and C173 (mean 1.4 days longer, 95% CI: 0.3, 2.5). Similar results were seen between the two clusters selected for 2018/2019. Conclusion In this study of hospitalized influenza patients, we show that distinct clusters with higher disease acuity can be identified and could be targeted for evaluations of vaccine and influenza antiviral effectiveness against disease attenuation. The association of higher disease acuity with glucose level merits evaluation.

Journal Article

Share this book

Add to My Shelf

A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering

by Botigué, Laura R. , Forest, Félix , Maurin, Olivier in Angiospermae , Angiosperms , Cluster Analysis

2019

Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5–15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.

Journal Article

Share this book

Add to My Shelf

Maximizing Energy Harvesting with the Aid of Reconfigurable Intelligent Surface for UAV Using Proximal Policy Optimization Algorithm

by Gupta, Priyadarshni , Kumar, Praveen , Mahesh, Rallabhandi S. K. in Algorithms , Cluster analysis , Clustering

2024

Unmanned aerial vehicles (UAVs) equipped with Reconfigurable intelligent surfaces (UAV-RIS) are able to offer ubiquitous communication services in areas where communication is disabled, but it is limited by the on-board energy of UAVs. This paper presented the EH-RIS system, a novel energy harvesting (EH) strategy designed for high-performance next-generation wireless systems. By utilizing passive reflective arrays to facilitate parallel energy harvesting and information transfer, the EH-RIS system expands upon the idea of Simultaneous Wireless Information and Power Transmission. However, pedestrian mobility and rapid channel changes imposed through external factors make efficient resource allocation in wireless systems challenging. Thus, a robust model-free, on-policy, an actor-critic method called Proximal Policy Optimization algorithm based on Deep Reinforcement Learning is developed, which improves the decision-making of proposed EH-RH systems for ensuring quality of services under the dynamic wireless environment. The K-means clustering technique and K-medoids have been introduced to optimize the UAV trajectory design. Simulation results show that the provided EH-RIS-based system is both effective and efficient. It performs better than state-of-the-art systems at the moment and is nearly as efficient as exhaustive search strategies. Our proposed approach has great potential for enhancing UAV-RIS systems and enabling more connectivity in places where communication is extremely difficult.

Journal Article

Share this book

Add to My Shelf

A Two-Level Clustered Consensus-Based Bundle Algorithm for Dynamic Heterogeneous Multi-UAV Multi-Task Allocation

by Wang, Chunjiang , Wang, Yichao , Ren, Shuangyin in Algorithms , Analysis , Collaboration

2025

In multi-UAV cooperative tasks, dynamic communication topologies and resource heterogeneity present significant challenges for distributed task allocation, leading to high communication overhead and poor task-resource matching, which in turn increases computational costs. While the Consensus-Based Bundle Algorithm (CBBA) offers a robust decentralized framework, its scalability and adaptability in heterogeneous, large-scale scenarios are limited. To overcome these issues, this paper introduces a novel Two-Level Clustered CBBA (TLC-CBBA). In the first-layer clustering, UAVs are grouped based on communication topology using graph-theoretic centrality measures to rank node importance, followed by clustering based on shortest-path distances to minimize communication costs. In the second-layer clustering, a resource-balanced and distance-aware K-medoids algorithm is applied within each subgroup obtained from the first-layer clustering, taking into account UAV resource heterogeneity and spatial proximity. This method ensures spatial compactness among UAVs within each subgroup while achieving a more balanced distribution of total resources across clusters. Finally, after completing the two-level clustering, each subgroup executes CBBA for local task bundling and consensus, while the cluster centers coordinate inter-cluster communication to guarantee globally consistent and conflict-free task allocation. Simulations across diverse mission scenarios and UAV team sizes demonstrate that TLC-CBBA substantially outperforms CBBA and its variants (DMCHBA, G-CBBA, and Clustering-CBBA) in terms of communication efficiency, total task score, runtime, and significance analysis. The proposed TLC-CBBA demonstrates strong robustness and scalability for heterogeneous multi-UAV task allocation in dynamic environments.

Journal Article

Share this book

Add to My Shelf

Land Cover Classification Model Using Multispectral Satellite Images Based on a Deep Learning Synergistic Semantic Segmentation Network

by Gharahbagh, Abdorreza Alavi , Hajihashemi, Vahid , Machado, José J. M. in Accuracy , Agricultural land , Artificial intelligence

2025

Land cover classification (LCC) using satellite images is one of the rapidly expanding fields in mapping, highlighting the need for updating existing computational classification methods. Advances in technology and the increasing variety of applications have introduced challenges, such as more complex classes and a demand for greater detail. In recent years, deep learning and Convolutional Neural Networks (CNNs) have significantly enhanced the segmentation of satellite images. Since the training of CNNs requires sophisticated and expensive hardware and significant time, using pre-trained networks has become widespread in the segmentation of satellite image. This study proposes a hybrid synergistic semantic segmentation method based on the Deeplab v3+ network and a clustering-based post-processing scheme. The proposed method accurately classifies various land cover (LC) types in multispectral satellite images, including Pastures, Other Built-Up Areas, Water Bodies, Urban Areas, Grasslands, Forest, Farmland, and Others. The post-processing scheme includes a spectral bag-of-words model and K-medoids clustering to refine the Deeplab v3+ outputs and correct possible errors. The simulation results indicate that combining the post-processing scheme with deep learning improves the Matthews correlation coefficient (MCC) by approximately 5.7% compared to the baseline method. Additionally, the proposed approach is robust to data imbalance cases and can dynamically update its codewords over different seasons. Finally, the proposed synergistic semantic segmentation method was compared with several state-of-the-art segmentation methods in satellite images of Italy’s Lake Garda (Lago di Garda) region. The results showed that the proposed method outperformed the best existing techniques by at least 6% in terms of MCC.

Journal Article

Share this book

Add to My Shelf

Automated sparse feature selection in high-dimensional proteomics data via 1-bit compressed sensing and K-Medoids clustering

by Liu, MeiNa , Su, Yue , Wen, FuDong in Accuracy , Algorithms , Automation

2025

Background High-dimensional proteomics data present significant challenges in biomarker discovery due to technical noise, feature redundancy, and multicollinearity. Current feature selection methods, including filter, wrapper, and embedded approaches, struggle with stability, sparsity, and computational efficiency. To address these limitations, we propose Soft-Thresholded Compressed Sensing (ST-CS), a hybrid framework integrating 1-bit compressed sensing with K-Medoids clustering. Unlike conventional methods relying on manual thresholds, ST-CS automates feature selection by dynamically partitioning coefficient magnitudes into discriminative biomarkers and noise. Results Evaluations on simulated and real-world proteomic datasets demonstrated ST-CS’s superiority in feature selection capability and classification performance. In simulations, ST-CS achieved feature selection robustness with balanced sensitivity (> 80%) and specificity (> 99.8%), reducing false discovery rates (FDR) by 20–50% compared to Hard-Thresholded Compressed Sensing (HT-CS). Additionally, it attained superior F1 scores and Matthews Correlation Coefficients (MCC), outperforming HT-CS, LASSO, and SPLSDA in identifying true biomarkers while suppressing noise. For classification performance, ST-CS surpassed all methods in the area under the receiver operating characteristic curve (AUC) across varying noise levels while maintaining sparsity. Applied to Clinical Proteomic Tumor Analysis Consortium (CPTAC) datasets, ST-CS matched HT-CS’s classification accuracy (AUC = 97.47% for intrahepatic cholangiocarcinoma) but with 57% fewer selected features (37 vs. 86), demonstrating its dual strength in precision biomarker discovery and predictive accuracy. For glioblastoma data, ST-CS achieved higher AUC (72.71%) than HT-CS (72.15%), LASSO (67.80%), and SPLSDA (71.38%) while retaining a parsimonious feature set (30 vs. 58 features for HT-CS). In ovarian serous cystadenocarcinoma, ST-CS further demonstrated its adaptability, attaining superior AUC (75.86%) over HT-CS (75.61%), LASSO (61.00%), and SPLSDA (70.75%) with only 24 ± 5 selected biomarkers. These results highlight ST-CS’s ability to rigorously automate feature selection while balancing classification efficacy, interpretability, and scalability for translational proteomics.

Journal Article

Share this book

Add to My Shelf

Collaborative Forecasting of Multiple Energy Loads in Integrated Energy Systems Based on Feature Extraction and Deep Learning

by Wang, Zhe , Qiu, Xiaoyu , Luo, Fengzhang in Algorithms , China , Clustering

2025

Accurate load forecasting is crucial for the safe, stable, and economical operation of integrated energy systems. However, directly applying single models to predict coupled cooling, heating, and electric loads under complex influencing factors often yields unsatisfactory results. This paper proposes a collaborative load forecasting method based on feature extraction and deep learning. First, the complete ensemble empirical mode decomposition with adaptive noise algorithm decomposes load data, and a dynamic time warping-based k-medoids clustering algorithm reconstructs subsequences aligned with system load components. Second, a correlation analysis identifies the key influencing factors for model input. Then, a multi-task parallel learning framework combining a regression convolutional neural network and long short-term memory networks is developed to predict reconstructed subsequences. Case studies demonstrate that the proposed model achieves mean absolute percentage errors (MAPE) of 2.24%, 2.75%, and 1.69% for electricity, cooling, and heating loads on summer workdays, with mean accuracy (MA) values of 97.76%, 97.25%, and 98.31%, respectively. For winter workdays, the MAPE values are 2.92%, 1.66%, and 2.87%, with MA values of 97.08%, 98.34%, and 97.13%. Compared to traditional single-task models, the weighted mean accuracy (WMA) improves by 2.01% and 2.33% in summer and winter, respectively, validating its superiority. This method provides a high-precision tool for the planning and operation of integrated energy systems.

Journal Article

Share this book

Add to My Shelf

Factors affecting the resilience of subway operations under emergencies – using improved DEMATEL model

by Zhang, Xiaoxue , Bu, Zehui , Liu, Jicai in Accident prevention , Algorithms , Clustering

2024

PurposeSubway systems are highly susceptible to external disturbances from emergencies, triggering a series of consequences such as the paralysis of the internal network transportation functions, causing significant economic and safety losses to cities. Therefore, it is necessary to analyze the factors affecting the resilience of the subway system to reduce the impact of disaster incidents.Design/methodology/approachUsing the interval type-2 fuzzy linguistic term set and the K-medoids clustering algorithm, this paper improves the Decision-Making Trial and Evaluation Laboratory (DEMATEL) method to construct a subway resilience factor analysis model for emergencies. Through comparative analysis, this study confirms the superior performance of the proposed approach in enhancing the precision of the DEMATEL method.Findings The results indicate that the operation and management level of emergency command organizations is the key resilience factors of subway operations in China. Furthermore, based on real case analyses, the corresponding suggestions and measures are put forward to improve the overall operation resilience level of the subway.Originality/value This paper identifies four emergency scenarios and 15 resilience factors affecting subway operations through literature review and expert consultation. The improved fuzzy DEMATEL method is applied to explore the levels of influence and causal mechanisms among the resilience factors of the subway system under the four emergency scenarios.

Journal Article

Share this book

Add to My Shelf

An empirical study on the improvement of students’ physical fitness and health by college physical education programs based on the background of big data

by Hou, Qian in 68T09 , Big Data , Cluster centroids

2025

The reform of college physical education courses in the context of big data is of great significance to improve the quality of teaching and meet the needs of students. The study is based on the K-medus clustering algorithm to personalize the teaching content of college physical education courses. The standard deviation is used to define the initial centroid candidate set, and the initial centroids are determined in a stepwise increasing manner, which ensures that the sample points with greater densities are selected as the initial clustering centroids. Students with similar body types are clustered together by the method, and teachers can create targeted individualized teaching content based on students with different body types. After the implementation of personalized teaching, the physical fitness of both boys and girls improved. The excellent and good rates of boys’ physical health increased by 7.75% and 4.34%, respectively. The excellent and good rates of physical health among female students increased by 14.03%. It shows that students’ physical fitness has significantly improved after reforming the physical education program in the context of big data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter