Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
101
result(s) for
"Kmeans"
Sort by:
Multi-modal AI reveals thermal environments: LST evolution and driving factors in the Yangtze River delta urban agglomeration, China
by
Liu, Guoyin
in
KMeans-DBSCAN hybrid clustering model (KMeans-DBSCAN)
,
multimodal AI
,
urban heat island (UHI)
2026
The rapid pace of urbanization has intensified the urban heat environment, posing significant challenges to sustainable urban development. This study takes the Yangtze River Delta (YRD) urban agglomeration as its research area and utilizes MODIS summer land surface temperature (LST) remote sensing data with a spatial resolution of 1 km from 2000 to 2022. It proposes a multi-modal AI-driven integrated framework that combines Getis-Ord G spatial clustering analysis, Isolation Forest anomaly detection, KMeans-DBSCAN hybrid clustering, and the Percentile threshold method through weighted fusion, and evaluates the integrated results using hotspot coverage, temperature contrast, and spatial consistency. The study found that the area of heat spots in the YRD region increased significantly from less than 1% in 2000 to approximately 12% in 2022, with a phased surge after 2010, forming a continuous heat island belt spanning core cities such as Shanghai, Hangzhou, and Suzhou. The integrated model achieved over a 20% improvement in hotspot identification performance compared to single methods. Based on XGBoost-SHAP analysis results, a 1% increase in per capita GDP and impervious surface area (ISA) expansion reduces heat island area by 1.02% and 0.87%, respectively. Conversely, a 1 unit increase in annual average LST and nighttime light index increases heat island area by 0.94% and 0.88%, respectively. This study emphasizes the mitigation of regional urban heat island effects through differentiated spatial strategies and green infrastructure development, which may provide important scientific basis for urban climate adaptation planning and sustainable development.
Journal Article
DSGD++: Reducing Uncertainty and Training Time in the DSGD Classifier through a Mass Assignment Function Initialization Technique
2025
Several studies have shown that the Dempster-Shafer theory (DST) can be successfully applied to scenarios where model interpretability is essential. Although DST-based algorithms offer significant benefits, they face challenges in terms of efficiency. We present a method for the Dempster-Shafer Gradient Descent (DSGD) algorithm that significantly reduces training time-by a factor of 1.6-and also reduces the uncertainty of each rule (a condition on features leading to a class label) by a factor of 2.1, while preserving accuracy comparable to other statistical classification techniques. Our main contribution is the introduction of a \"confidence\" level for each rule. Initially, we define the \"representativeness\" of a data point as the distance from its class's center. Afterward, each rule's confidence is calculated based on representativeness of data points it covers. This confidence is incorporated into the initialization of the corresponding Mass Assignment Function (MAF), providing a better starting point for the DSGD's optimizer and enabling faster, more effective convergence. The code is available at https://github.com/HaykTarkhanyan/DSGD-Enhanced.
Journal Article
DISCRETIZING UNOBSERVED HETEROGENEITY
by
Bonhomme, Stéphane
,
Lamadon, Thibaut
,
Manresa, Elena
in
Classification
,
Clustering
,
dimension reduction
2022
We study discrete panel data methods where unobserved heterogeneity is revealed in a first step, in environments where population heterogeneity is not discrete. We focus on two-step grouped fixed-effects (GFE) estimators, where individuals are first classified into groups using kmeans clustering, and the model is then estimated allowing for group-specific heterogeneity. Our framework relies on two key properties: heterogeneity is a function—possibly nonlinear and time-varying—of a low-dimensional continuous latent type, and informative moments are available for classification. We illustrate the method in a model of wages and labor market participation, and in a probit model with time-varying heterogeneity. We derive asymptotic expansions of two-step GFE estimators as the number of groups grows with the two dimensions of the panel. We propose a data-driven rule for the number of groups, and discuss bias reduction and inference.
Journal Article
Research on the classification and application of precision marketing based on big data e-commerce platforms
2024
This paper proposes and improves the traditional K-means algorithm, utilizes the HC-Kmeans algorithm to deeply analyze the marketing status quo of the M e-commerce platform, and constructs the overall framework of precision marketing based on big data technology. The RFM model is used to measure customer value and segment customer behavior. Formulate marketing strategies that correspond to the consumption habits and preferences of different categories of users. The data from 40 consecutive days of observation is used to verify the precision marketing effect of the A/B testing method, and the final results show a good improvement in the click rate, order rate, payment rate, and order amount. During the Double Twelve Shopping Festival, compared with the traditional mode, the payment amount of M Company’s product recommendation increased by 64,604 yuan, and the order increased by 301 units, so the implementation of the precision marketing strategy was effective.
Journal Article
A Method for Predicting Coal-Mine Methane Outburst Volumes and Detecting Anomalies Based on a Fusion Model of Second-Order Decomposition and ETO-TSMixer
2025
The ability to predict the volume of methane outbursts in coal mines is critical for the prevention of methane outburst accidents and the assurance of coal-mine safety. This paper’s central argument is that existing prediction models are limited in several ways. These limitations include the complexity of the models and their poor ability to generalize. The paper proposes a methane outburst volume-prediction and early-warning method. This method is based on a secondary decomposition and improved TSMixer model. First, data smoothing is achieved through an STL decomposition–adaptive Savitzky–Golay filtering–reconstruction framework to reduce temporal complexity. Second, a CEEMDAN-Kmeans-VMD secondary decomposition strategy is adopted to integrate intrinsic mode functions (IMFs) using K-means clustering. Variational mode decomposition (VMD) parameters are optimized via a novel exponential triangular optimization (ETO) algorithm to extract multi-scale features. Additionally, a refined TSMixer model is proposed, integrating reversible instance normalization (RevIn) to bolster the model’s generalizability and employing ETO to fine-tune model hyperparameters. This approach enables multi-component joint modeling, thereby averting error accumulation. The experimental results demonstrate that the enhanced model attains RMSE, MAE, and R2 values of 0.0151, 0.0117, and 0.9878 on the test set, respectively, thereby exhibiting a substantial improvement in performance when compared to the reference models. Furthermore, we propose an anomaly detection framework based on STL decomposition and dual lonely forests. This framework improves sensitivity to sudden feature changes and detection robustness through a weighted fusion strategy of global trends and residual anomalies. This method provides efficient and reliable dynamic early-warning technology support for coal-mine gas disaster prevention and control, demonstrating significant engineering application value.
Journal Article
Enhancing Sequence Movie Recommendation System Using Deep Learning and KMeans
by
Ilkhomjon, Sadriddinov
,
Siet, Sophort
,
Park, Doo-Soon
in
Algorithms
,
Artificial intelligence
,
Big Data
2024
A flood of information has occurred, making it challenging for people to find and filter their favorite items. Recommendation systems (RSs) have emerged as a solution to this problem; however, traditional Appenrecommendation systems, including collaborative filtering, and content-based filtering, face significant challenges such as data scalability, data scarcity, and the cold-start problem, all of which require advanced solutions. Therefore, we propose a ranking and enhancing sequence movie recommendation system that utilizes the combination model of deep learning to resolve the existing issues. To mitigate these challenges, we design an RSs model that utilizes user information (age, gender, occupation) to analyze new users and match them with others who have similar preferences. Initially, we construct sequences of user behavior to effectively predict the potential next target movie of users. We then incorporate user information and movie sequence embeddings as input features to reduce the dimensionality, before feeding them into a transformer architecture and multilayer perceptron (MLP). Our model integrates a transformer layer with positional encoding for user behavior sequences and multi-head attention mechanisms to enhance prediction accuracy. Furthermore, the system applies KMeans clustering to movie genre embeddings, grouping similar movies and integrating this clustering information with predicted ratings to ensure diversity in the personalized recommendations for target users. Evaluating our model on two MovieLens datasets (100 Kand 1 M) demonstrated significant improvements, achieving RMSE, MAE, precision, recall, and F1 scores of 1.0756, 0.8741, 0.5516, 0.3260, and 0.4098 for the 100 K dataset, and 0.9927, 0.8007, 0.5838, 0.4723, and 0.5222 for the 1 M dataset, respectively. This approach not only effectively mitigates cold-start and scalability issues but also surpasses baseline techniques in Top-N item recommendations, highlighting its efficacy in the contemporary environment of abundant data.
Journal Article
Optimizing Models and Data Denoising Algorithms for Power Load Forecasting
2024
To handle the data imbalance and inaccurate prediction in power load forecasting, an integrated data denoising power load forecasting method is designed. This method divides data into administrative regions, industries, and load characteristics using a four-step method, extracts periodic features using Fourier transform, and uses Kmeans++ for clustering processing. On this basis, a Transformer model based on an adversarial adaptive mechanism is designed, which aligns the data distribution of the source domain and target domain through a domain discriminator and feature extractor, thereby reducing the impact of domain offset on prediction accuracy. The mean square error of the Fourier transform clustering method used in this study was 0.154, which was lower than other methods and had a better data denoising effect. In load forecasting, the mean square errors of the model in predicting long-term load, short-term load, and real-time load were 0.026, 0.107, and 0.107, respectively, all lower than the values of other comparative models. Therefore, the load forecasting model designed for research has accuracy and stability, and it can provide a foundation for the precise control of urban power systems. The contributions of this study include improving the accuracy and stability of the load forecasting model, which provides the basis for the precise control of urban power systems. The model tracks periodicity, short-term load stochasticity, and high-frequency fluctuations in long-term loads well, and possesses high accuracy in short-term, long-term, and real-time load forecasting.
Journal Article
Potato Plant Leaves Disease Detection and Classification using Machine Learning Methodologies
2021
Agriculture is one of the essential sectors for the survival of humankind. At the same time, digitalization touching across all the fields that became easier to handle various difficult tasks. Adapting technology as well as digitalization is very crucial for the field of agriculture to benefit the farmer as well as the consumer. Due to adopting technology and regular monitoring, one can able to identify the diseases at the very initial stages and those can be eradicated to obtain a better yield of the crop. In this document, a methodology was proposed for the detection as well as the classification of diseases that occur for the potato plants. For this scenario, the openly accessible, standard, and reliable data set was considered which was popularly known as Plant Village Dataset. For the process of image segmentation, the K-means methodology was considered, for the feature extraction purpose, the gray level co-occurrence matrix concept was utilized, and for the classification purpose, the multi-class support vector machine methodology was utilized. The proposed methodology able to attain an accuracy of 95.99%.
Journal Article
Cyber-resilient machine learning framework for accurate individual load forecasting and anomaly detection in smart grids
2025
With the evolution of smart grids, accurate and secure predictions of the electricity load become crucial for efficient energy management and reliability. In this paper, a scalable and cyber-resilient methodology for electricity consumption forecasting on individual smart meter level based on machine learning and anomaly detection schemes is proposed. The proposed technique utilizes K-MEANS Clustering and Neural Networks (KMEANS–NN) to enhance Individual Load Forecasting (ILF) with reduced computational complexity and high prediction accuracy. A Principal Component Analysis based One-Class Support Vector Machine (PCA–OCSVM) model is employed as an Anomaly Detection Scheme (ADS) to identify the false data injection attacks in smart meter telemetry. The system uses five months of real-world data from
smart meters gathered under the supervision of Electrical Distribution Sector (EDS) of Suez Canal Authority (SCA) in Egypt. KMEANS–NN strategy reduces significantly MAAPE by up to
and cuts computational time from days to minutes. It improves forecasting accuracy across four proposed models: ARIMA, CTREE, MLP and NNETAR. To assess the cyber-security profile,
of the dataset is orchestrated with scaling, ramping and random cyber-attack simulation. Proposed ADS achieves
overall accuracy,
sensitivity,
precision,
specificity and F1-score of
, whereas it’s
accurate on clean data. This integrated model offers accurate, efficient, and secure load forecasting presenting good potential for its deployment in large-scale smart grid environments.
Journal Article
Preliminary Study on Sapphire Color Grading Method Based on Automatic Clustering Algorithm of Color Space Features
2021
Traditionally, the color grading of sapphire is mainly based on the naked eye judgment of the appraiser. This judgment standard is not clear enough, and the judgment result has a greater subjective influence, which affects the accuracy of the classification. In this study, the GEM-3000 ultraviolet-visible spectrophotometer was selected, and the color features of 180 sapphire samples were extracted and classified using the CIE1976 color space of the device. The Kmeans algorithm was used to cluster analysis of 140 samples, and the separability of the color space features of different color levels was verified, and the center sample of each color level was obtained. The Euclidean distance between the centers of the remaining 40 samples is calculated, and each color grade prediction label is determined, and the sapphire color is automatically classified based on this. The experimental results show that the accuracy of sapphire color classification using the above method is 97.5%, which confirms the effect and accuracy of the artificial intelligence method in sapphire color classification.
Journal Article