Catalogue Search | MBRL

Predictive maintenance in Industry 4.0: a survey of planning models and machine learning techniques

by Panjanathan, Rukmani , Hector, Ida in Artificial Intelligence , Artificial intelligence algorithms , Data Mining and Machine Learning

2024

Equipment downtime resulting from maintenance in various sectors around the globe has become a major concern. The effectiveness of conventional reactive maintenance methods in addressing interruptions and enhancing operational efficiency has become inadequate. Therefore, acknowledging the constraints associated with reactive maintenance and the growing need for proactive approaches to proactively detect possible breakdowns is necessary. The need for optimisation of asset management and reduction of costly downtime emerges from the demand for industries. The work highlights the use of Internet of Things (IoT)-enabled Predictive Maintenance (PdM) as a revolutionary strategy across many sectors. This article presents a picture of a future in which the use of IoT technology and sophisticated analytics will enable the prediction and proactive mitigation of probable equipment failures. This literature study has great importance as it thoroughly explores the complex steps and techniques necessary for the development and implementation of efficient PdM solutions. The study offers useful insights into the optimisation of maintenance methods and the enhancement of operational efficiency by analysing current information and approaches. The article outlines essential stages in the application of PdM, encompassing underlying design factors, data preparation, feature selection, and decision modelling. Additionally, the study discusses a range of ML models and methodologies for monitoring conditions. In order to enhance maintenance plans, it is necessary to prioritise ongoing study and improvement in the field of PdM. The potential for boosting PdM skills and guaranteeing the competitiveness of companies in the global economy is significant through the incorporation of IoT, Artificial Intelligence (AI), and advanced analytics.

Journal Article

Share this book

Add to My Shelf

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

by Al-Baity, Heyam , Abou Elwafa, Afnan , Dris, Alanoud Bin in Accuracy , Algorithms , Classification

2021

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Journal Article

Share this book

Add to My Shelf

Joint graph and reduced flexible manifold embedding for scalable semi-supervised learning

in Affinity , Algorithms , Attention

2023

Recently, graph-based semi-supervised learning (GSSL) has received much attention. On the other hand, less attention has been paid to the problem of large-scale GSSL for inductive multi-class classification. Existing scalable GSSL methods rely on a hard linear constraint. They cannot predict the labelling of test samples, or use predefined graphs, which limits their applications and performance. In this paper, we propose an inductive algorithm that can handle large databases by using anchors. The main contribution compared to existing scalable semi-supervised models is the integration of the anchor graph computation into the learned model. We develop a criterion to jointly estimate the unlabeled sample labels, the mapping of the feature space to the label space, and the affinity matrix of the anchor graph. Furthermore, the fusion of labels and features of anchors is used to construct the graph. Using the projection matrix, it can also predict the labels of the test samples by linear transformation. Experimental results on the large datasets NORB, RCV1 and Covtype show the effectiveness, scalability and superiority of the proposed method. The code of the proposed method can be found at the following link https://github.com/ZoulfikarIB/SGRFME .

Journal Article

Share this book

Add to My Shelf

Assessing the performance of machine learning models for default prediction under missing data and class imbalance: A simulation study

by Verster, Tanja , Dube, Lindani in Classification , Credit Risk , Imbalance

2024

In the field of machine learning, robust model performance is essential for accurate predictions and informed decision-making. One critical challenge that hampers the performance of machine learning algorithms is the presence of missing data. Missing values are ubiquitous in real-world datasets and can substantially impact the performance of predictive models. This study explored the impact of increasing levels of missing values on the performance of machine learning models. Simulated samples with missing values ranging from 5% to 50% were generated, and various models were evaluated accordingly. The results demonstrated a consistent trend of deteriorating model performance as the amount of missing values increases. Higher levels of missing values lead to decreased accuracy scores across all models. Among the models evaluated, decision trees (DT) and random forests (RF) consistently demonstrated high accuracy scores across all sampling techniques, showcasing their robustness in handling missing values. Logistic regression (LR) also performed relatively well, showing consistent performance across different levels of missing values. On the other hand, stochastic gradient descent classifier (SGDC), K-nearest neighbours (kNN), and naïve Bayes (NB) models consistently exhibited lower accuracy scores across all sampling techniques, indicating limitations in handling missing values even when the dataset was more balanced. Furthermore, the study highlights the superiority of the SMOTE (Synthetic Minority OVER-sampling Technique) sampling technique compared to the UNDER-sampling approach. Models trained using SMOTE consistently achieved higher accuracy scores across all levels of missing values. This suggests that SMOTE sampling effectively handles imbalanced datasets and enhances classification performance, particularly when dealing with missing values. As the quest for accurate predictions gains paramount importance, addressing the pervasive challenge of missing data emerges as a cornerstone for unlocking the true potential of machine learning in real-world applications.

Journal Article

Share this book

Add to My Shelf

DLCPD-25: A Large-Scale and Diverse Dataset for Crop Disease and Pest Recognition

by Su, Wen-Hao , Wang, Rui-Feng , Zhang, Heng-Wei in Accuracy , Agricultural pests , Agriculture

2025

The accurate identification of crop pests and diseases is critical for global food security, yet the development of robust deep learning models is hindered by the limitations of existing datasets. To address this gap, we introduce DLCPD-25, a new large-scale, diverse, and publicly available benchmark dataset. We constructed DLCPD-25 by integrating 221,943 images from both online sources and extensive field collections, covering 23 crop types and 203 distinct classes of pests, diseases, and healthy states. A key feature of this dataset is its realistic complexity, including images from uncontrolled field environments and a natural long-tail class distribution, which contrasts with many existing datasets collected under controlled conditions. To validate its utility, we pre-trained several state-of-the-art self-supervised learning models (MAE, SimCLR v2, MoCo v3) on DLCPD-25. The learned representations, evaluated via linear probing, demonstrated strong performance, with the SimCLR v2 framework achieving a top accuracy of 72.1% and an F1 score (Macro F1) of 71.3% on a downstream classification task. Our results confirm that DLCPD-25 provides a valuable and challenging resource that can effectively support the training of generalizable models, paving the way for the development of comprehensive, real-world agricultural diagnostic systems.

Journal Article

Share this book

Add to My Shelf

Enhanced Gold Ore Classification: A Comparative Analysis of Machine Learning Techniques with Textural and Chemical Data

by Carneiro, Cleyton de Carvalho , Ulsen, Carina , Costa, Fabrizzio Rodrigues in Accuracy , Algorithms , Arsenic

2025

Specific computational methods, such as machine learning algorithms, can assist mining professionals in quickly and consistently identifying and addressing classification issues related to mineralized horizons, as well as uncovering key variables that impact predictive outcomes, many of which were previously difficult to observe. The integration of numerical and categorical variables, which are part of a dataset for defining ore grades, is part of the daily routine of professionals who obtain the data and manipulate the various phases of analysis in a mining project. Several supervised and unsupervised machine learning methods and applications integrate a wide variety of algorithms that aim at the efficient recognition of patterns and similarities and the ability to make accurate and assertive decisions. The objective of this study is the classification of gold ore or gangue through supervised machine learning methods using numerical variables represented by grade and categorical variables obtained through drillholes descriptions. Four groups of variables were selected with different variable configurations. The application of classification algorithms to different groups of variables aimed to observe the variables of importance and the impact of each one on the classification, in addition to testing the best algorithm in terms of accuracy and precision. The datasets were subjected to training, validation, and testing using the decision tree, random forest, Adaboost, XGBoost, and logistic regression methods. The evaluation was randomly divided into training (60%) and testing (40%) with 10-fold cross-validation. The results revealed that the XGBoost algorithm obtained the best performance, with an accuracy of 0.96 for scenario C1. In the SHAP analysis, the variable As was prominent in the predictions, mainly in scenarios C1 and C3. The arsenic class (Class_As), present mainly in scenario C4, had a significant positive weight in the classification. In the Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) curves, the results showed that XGBoost/scenario C1 obtained the highest AUC of 0.985, indicating that the algorithm had the best performance in ore/gangue classification of the sample set. The logistic regression algorithm together with AdaBoost had the worst performance, also varying between scenarios.

Journal Article

Share this book

Add to My Shelf

Supervised Multi-Layer Conditional Variational Auto-Encoder for Process Modeling and Soft Sensor

by Li, Yuan , Tang, Xiaochu , Yan, Jiawei in Accuracy , Analysis , deep learning

2023

Variational auto-encoders (VAE) have been widely used in process modeling due to the ability of deep feature extraction and noise robustness. However, the construction of a supervised VAE model still faces huge challenges. The data generated by the existing supervised VAE models are unstable and uncontrollable due to random resampling in the latent subspace, meaning the performance of prediction is greatly weakened. In this paper, a new multi-layer conditional variational auto-encoder (M-CVAE) is constructed by injecting label information into the latent subspace to control the output data generated towards the direction of the actual value. Furthermore, the label information is also used as the input with process variables in order to strengthen the correlation between input and output. Finally, a neural network layer is embedded in the encoder of the model to achieve online quality prediction. The superiority and effectiveness of the proposed method are demonstrated by two real industrial process cases that are compared with other methods.

Journal Article

Share this book

Add to My Shelf

Using Satellite Telemetry and Aerial Counts to Estimate Space Use by Grey Seals around the British Isles

by Duck, Callan , McConnell, Bernie , Matthiopoulos, Jason in Animal, plant and microbial ecology , Animals , Applied ecology

2004

1. In the UK, resolving conflicts between the conservation of grey seals, the management of fish stocks and marine exploitation requires knowledge of the seals' use of space. We present a map of grey seal usage around the British Isles based on satellite telemetry data from adult animals and haul-out survey data. 2. Our approach combined modelling and interpolation. To model the seals' association with particular coastal sites (the haul-outs), we divided the population into sub-populations associated with 24 haul-out groups. Haul-out-specific maps of accessibility were used to supervise usage estimation from satellite telemetry. The mean and variance of seal numbers at each haul-out group were obtained from haul-out counts. The aggregate map of usage for the entire population was produced by adding together the haul-out-specific usage maps, weighted by mean number of animals using that haul-out. 3. Seal usage was primarily concentrated (i) off the northern coasts of the British Isles, (ii) closer to the coast than might be expected purely on the basis of accessibility from the haul-outs and (iii) in a limited number of marine hot-spots. 4. Although our results currently represent the best estimate of how grey seals use the marine environment around Britain, they are neither definitive nor equally precise for all haul-outs. Further data collection should focus in the south-west of the British isles and aerial counts should be repeated for all haul-outs. 5. Synthesis and applications. This work provides environmental managers with current estimates of grey seal usage and describes a methodology for maximizing data efficiency. Our results could guide government departments in licensing marine exploitation by the oil industry, in estimating grey seal predation pressure on vulnerable or economically important prey and in delineating marine special areas of conservation (SAC). Our finding that grey seal usage is characterized by a limited number of hot-spots means that the species is particularly suited to localized conservation efforts.

Journal Article

Share this book

Add to My Shelf

Voxel-wise segmentation for porosity investigation of additive manufactured parts with 3D unsupervised and (deeply) supervised neural networks

by Beenhouwer, Jan De , Sijbers, Jan , Cornelissen, Sven in 3-D printers , Additive manufacturing , Algorithms

2024

Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all samples of a batch, X-ray computed tomography (X-CT) is often used in combination with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly used, as they can be trained to be robust to the material being analysed and resilient to poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. Additionally, there is a notable absence of comparisons between supervised and unsupervised models for voxel-wise pore segmentation tasks. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet, ACC-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE, RV-VAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch approach for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models was post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was UNet++ with an average precision of 0.751 ± 0.030, while the best unsupervised model was the post-processed ceVAE with 0.830 ± 0.003. Notably, the ceVAE model, with its post-processing technique, exhibited superior capabilities, endorsing unsupervised learning as the preferred approach for the voxel-wise pore segmentation task.

Journal Article

Share this book

Add to My Shelf

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms

by Agarwal, Ritu , Ranga, Virender , Mehra, Sunakshi in Ablation , Adaptation , Classification

2024

Assessing the intelligibility of dysarthric speech, characterized by intricate speaking rhythms presents formidable challenges. Current techniques for objectively testing speech intelligibility are burdensome and subjective, particularly struggling with dysarthric spoken utterances. To tackle these hurdles, our method conducts an ablation analysis across speakers afflicted with speech impairment. We utilize a unified approach that incorporates both auditory and visual elements to improve the classification of dysarthric spoken utterances. In our quest to enhance spoken utterance recognition, we propose employing two distinctive extractive transformer-based approaches. Initially, we employ SepFormer to refine the speech signal, prioritizing the enhancement of signal clarity. Subsequently, we feed the improved audio samples into Swin transformer after converting them into log mel spectrograms. Additionally, we harness the power of the Swin transformer for visual classification, trained on a dataset of 14 million annotated images from ImageNet. The pre-trained scores from the Swin transformer are utilized as input for the deep bidirectional long short-term memory with gated recurrent unit (deep BiLSTM-GRU) model, facilitating the classification of spoken utterances. Our proposed deep BiLSTM-GRU model for spoken utterance classification yields impressive results on the EasyCall speech corpus, encompassing cognitive characteristics across spoken utterances ranging from 10 to 20, delivered by both healthy individuals and those with dysarthria. Notably, our results showcase an accuracy of 98.56% for 20 utterances in male speakers, 95.11% in female speakers, and 97.64% in combined male and female speakers. Across diverse scenarios, our approach consistently achieves remarkable accuracy, surpassing other contemporary methods, all without necessitating data augmentation.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter