Catalogue Search | MBRL

Compressive sensing for wireless networks

by Han, Zhu, 1974- , Li, Husheng, 1975- , Yin, Wotao in Coding theory. , Data compression (Telecommunication) , Signal processing Digital techniques.

Book

Share this book

Add to My Shelf

Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis

by Imani, Mehdi , Arabnia, Hamid Reza in Accuracy , Algorithms , Artificial neural networks

2023

This paper explores the application of various machine learning techniques for predicting customer churn in the telecommunications sector. We utilized a publicly accessible dataset and implemented several models, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and gradient boosting techniques (XGBoost, LightGBM, and CatBoost). To mitigate the challenges posed by imbalanced datasets, we adopted different data sampling strategies, namely SMOTE, SMOTE combined with Tomek Links, and SMOTE combined with Edited Nearest Neighbors. Moreover, hyperparameter tuning was employed to enhance model performance. Our evaluation employed standard metrics, such as Precision, Recall, F1-score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). In terms of the F1-score metric, CatBoost demonstrates superior performance compared to other machine learning models, achieving an outstanding 93% following the application of Optuna hyperparameter optimization. In the context of the ROC AUC metric, both XGBoost and CatBoost exhibit exceptional performance, recording remarkable scores of 91%. This achievement for XGBoost is attained after implementing a combination of SMOTE with Tomek Links, while CatBoost reaches this level of performance after the application of Optuna hyperparameter optimization.

Journal Article

Share this book

Add to My Shelf

Unrestricted mixed data sampling (MIDAS): MIDAS regressions with unrestricted lag polynomials

by Foroni, Claudia , Schumacher, Christian , Marcellino, Massimiliano in Aggregation , Data sampling , Distributed lag polynomals

2015

Mixed data sampling (MIDAS) regressions allow us to estimate dynamic equations that explain a low frequency variable by high frequency variables and their lags. When the difference in sampling frequencies between the regressand and the regressors is large, distributed lag functions are typically employed to model dynamics avoiding parameter proliferation. In macroeconomic applications, however, differences in sampling frequencies are often small. In such a case, it might not be necessary to employ distributed lag functions. We discuss the pros and cons of unrestricted lag polynomials in MIDAS regressions. We derive unrestricted-MIDAS (U-MIDAS) regressions from linear high frequency models, discuss identification issues and show that their parameters can be estimated by ordinary least squares. In Monte Carlo experiments, we compare U-MIDAS with MIDAS with functional distributed lags estimated by non-linear least squares. We show that U-MIDAS performs better than MIDAS for small differences in sampling frequencies. However, with large differing sampling frequencies, distributed lag functions outperform unrestricted polynomials. The good performance of U-MIDAS for small differences in frequency is confirmed in empirical applications on nowcasting and short-term forecasting euro area and US gross domestic product growth by using monthly indicators.

Journal Article

Share this book

Add to My Shelf

Impact of Data Processing Techniques on AI Models for Attack-Based Imbalanced and Encrypted Traffic within IoT Environments

by Kim, Hwankuk , Won, Chaeeun , Kim, Yeasul in Artificial intelligence , Data analysis , Data processing

2026

With the increasing emphasis on personal information protection, encryption through security protocols has emerged as a critical requirement in data transmission and reception processes. Nevertheless, IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices, spanning a range of devices from non-encrypted ones to fully encrypted ones. Given the limited visibility into payloads in this context, this study investigates AI-based attack detection methods that leverage encrypted traffic metadata, eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices. Using the UNSW-NB15 and CICIoT-2023 dataset, encrypted and unencrypted traffic were categorized according to security protocol, and AI-based intrusion detection experiments were conducted for each traffic type based on metadata. To mitigate the problem of class imbalance, eight different data sampling techniques were applied. The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning (DL) models from various perspectives. The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic. In the UNSW-NB15 dataset, the f1-score of encrypted traffic was approximately 0.98, which is 4.3% higher than that of unencrypted traffic (approximately 0.94). In addition, analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43, indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance. Furthermore, when data sampling techniques were applied to encrypted traffic, the recall in the UNSW-NB15 (Encrypted) dataset improved by up to 23.0%, and in the CICIoT-2023 (Encrypted) dataset by 20.26%, showing a similar level of improvement. Notably, in CICIoT-2023, f1-score and Receiver Operation Characteristic-Area Under the Curve (ROC-AUC) increased by 59.0% and 55.94%, respectively. These results suggest that data sampling can have a positive effect even in encrypted environments. However, the extent of the improvement may vary depending on data quality, model architecture, and sampling strategy.

Journal Article

Share this book

Add to My Shelf

Class-Difficulty Based Methods for Long-Tailed Visual Recognition

by Ohashi, Hiroki , Nakamura, Katsuyuki , Sinha, Saptarshi in Artificial neural networks , Data sampling , Datasets

2022

Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.

Journal Article

Share this book

Add to My Shelf

An Efficient and Accurate Convolution-Based Similarity Measure for Uncertain Trajectories

by Li, Guanyao , Zhang, Ji , Xiong, Simin in Convolution , data collection , Data sampling

2023

With the rapid development of localization techniques and the prevalence of mobile devices, massive amounts of trajectory data have been generated, playing essential roles in areas of user analytics, smart transportation, and public safety. Measuring trajectory similarity is one of the fundamental tasks in trajectory analytics. Although considerable research has been conducted on trajectory similarity, the majority of existing approaches measure the similarity between two trajectories by calculating the distance between aligned locations, leading to challenges related to uncertain trajectories (e.g., low and heterogeneous data sampling rates, as well as location noise). To address these challenges, we propose Contra, a convolution-based similarity measure designed specifically for uncertain trajectories. The main focus of Contra is to identify the similarity of trajectory shapes while disregarding the time/order relevance of each record within the trajectory. To this end, it leverages a series of convolution and pooling operations to extract high-level geo-information from trajectories, and subsequently compares their similarities based on these extracted features. Moreover, we introduce efficient trajectory index strategies to enhance the computational efficiency of our proposed measure. We conduct comprehensive experiments on two trajectory datasets to evaluate the performance of our proposed approach. The experiments on both datasets show the effectiveness and efficiency of our approach. Specifically, the mean rank of Contra is 3 times better than the state-of-the-art approaches, and the precision of Contra surpasses baseline approaches by 20–40%.

Journal Article

Share this book

Add to My Shelf

Association between ambient air pollutants and upper respiratory tract infection and pneumonia disease burden in Thailand from 2000 to 2022: a high frequency ecological analysis

by Koo, Joel Ruihan , Janhavi, A. , Lim, Jue Tao in Aerosols , Air Pollutants - adverse effects , Air Pollutants - analysis

2023

Background A pertinent risk factor of upper respiratory tract infections (URTIs) and pneumonia is the exposure to major ambient air pollutants, with short term exposures to different air pollutants being shown to exacerbate several respiratory conditions. Methods Here, using disease surveillance data comprising of reported disease case counts at the province level, high frequency ambient air pollutant and climate data in Thailand, we delineated the association between ambient air pollution and URTI/Pneumonia burden in Thailand from 2000 – 2022. We developed mixed-data sampling methods and estimation strategies to account for the high frequency nature of ambient air pollutant concentration data. This was used to evaluate the effects past concentrations of fine particulate matter (PM 2.5 ), sulphur dioxide (SO 2 ), and carbon monoxide (CO) and the number of disease case count, after controlling for the confounding meteorological and disease factors. Results Across provinces, we found that past increases in CO, SO 2, and PM 2.5 concentration were associated to changes in URTI and pneumonia case counts, but the direction of their association mixed. The contributive burden of past ambient air pollutants on contemporaneous disease burden was also found to be larger than meteorological factors, and comparable to that of disease related factors. Conclusions By developing a novel statistical methodology, we prevented subjective variable selection and discretization bias to detect associations, and provided a robust estimate on the effect of ambient air pollutants on URTI and pneumonia burden over a large spatial scale.

Journal Article

Share this book

Add to My Shelf

Forecasting carbon dioxide emissions based on a hybrid of mixed data sampling regression model and back propagation neural network in the USA

by Calin, Adrian Cantemir , Han, Meng , Zhao, Xin in Algorithms , Aquatic Pollution , Atmospheric Protection/Air Quality Control/Air Pollution

2018

The accurate forecast of carbon dioxide emissions is critical for policy makers to take proper measures to establish a low carbon society. This paper discusses a hybrid of the mixed data sampling (MIDAS) regression model and BP (back propagation) neural network (MIDAS-BP model) to forecast carbon dioxide emissions. Such analysis uses mixed frequency data to study the effects of quarterly economic growth on annual carbon dioxide emissions. The forecasting ability of MIDAS-BP is remarkably better than MIDAS, ordinary least square (OLS), polynomial distributed lags (PDL), autoregressive distributed lags (ADL), and auto-regressive moving average (ARMA) models. The MIDAS-BP model is suitable for forecasting carbon dioxide emissions for both the short and longer term. This research is expected to influence the methodology for forecasting carbon dioxide emissions by improving the forecast accuracy. Empirical results show that economic growth has both negative and positive effects on carbon dioxide emissions that last 15 quarters. Carbon dioxide emissions are also affected by their own change within 3 years. Therefore, there is a need for policy makers to explore an alternative way to develop the economy, especially applying new energy policies to establish a low carbon society.

Journal Article

Share this book

Add to My Shelf

Impact of Data Corruption and Operating Temperature on Performance of Model-Based SoC Estimation

by Stojcevski, Alex , Mekhilef, Saad , Shrivastava, Prashant in Accuracy , Algorithms , Analysis

2024

Electric vehicles (EVs) are becoming popular around the world. Making a lithium battery (LIB) pack with a robust battery management system (BMS) for an EV to operate under different complex environments is both a challenge and a requirement for engineers. A BMS can intelligently manage LIB systems by estimating the battery state of charge (SoC). Due to the nonlinear characteristics of LIB, influenced by factors such as the harsh environment and data corruption caused by electromagnetic interference (EMI) inside electric vehicles, SoC estimation should consider available capacity, model parameters, operating temperature and reductions in data sampling time. The widely used model-based algorithms, such as the extended Kalman filter (EKF) have limitations. Therefore, a detailed review of the balance between temperature, data sampling time, and different model-based algorithms is necessary. Firstly, a state of charge—open-circuit voltage (SoC-OCV) curve of LIB is obtained by the polynomial curve fitting (PCF) method. Secondly, a first-order RC (1-RC) equivalent circuit model (ECM) is applied to identify the battery parameters using a forgetting factor-based recursive least squares algorithm (FF-RLS), ensuring accurate internal battery parameters for the next step of SoC estimation. Thirdly, different model-based algorithms are utilized to estimate the SoC of LIB under various operating temperatures and data sampling times. Finally, the experimental data by dynamic stress test (DST) is collected at temperatures of 10 °C, 25 °C, and 40 °C, respectively, to verify and analyze the impact of operating temperature and data sampling time to provide a practical reference for the SoC estimation.

Journal Article

Share this book

Add to My Shelf

Ensemble-Based Machine Learning Algorithms Combined with Near Miss Method for Software Bug Prediction

by Nehéz, Károly , Khleel, Nasraldeen Alnor Adam , Hisaen, Ahmed in Accuracy , Algorithms , Bagging

2025

Software bug prediction (SBP) involves identifying or categorizing software modules likely to contain defects, utilizing underlying system properties such as software metrics. SBP plays a crucial role in enhancing software project quality and mitigating maintenance risks. Numerous machine learning (ML) algorithms have been developed to predict software bugs. Class imbalance poses a significant challenge for these algorithms, significantly impeding their effectiveness and resulting in imbalanced false-positive and false-negative outcomes. However, limited research has been conducted to specifically tackle the issue of class imbalance in the context of SBP. This study investigates the prediction performance of a homogeneous ensemble: Bagging, boosting, and voting classifiers (VC) methods combined with the under-sampling methods to address the class imbalance problem and improve the accuracy of SBP. Two ensembles are classified as bagging ensembles: decision tree (DT) and random forest (RF); two ensembles are classified as boosting ensembles: AdaBoost (AB) and gradient boosting (GB), while the DT, RF, K-Nearest Neighbours (K-NN), and support vector machine (SVM) are considered as VC. To establish the effectiveness of the proposed models, the experiments were conducted on the available benchmark datasets, which comprise five public datasets based on both class and file-level metrics. We compared and evaluated the performance of the proposed models according to several performance measures, namely accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), and the area under the receiver operating characteristic curve (AUROC). The experimental findings demonstrated that the proposed models exhibit superior efficiency in predicting software bugs on balanced datasets compared to the original datasets, with an improvement of up to 11% accuracy for the class-level metrics and 10% for the file-level metrics. The results indicate that the use of data sampling techniques had a positive impact on the prediction accuracy of the presented models. We compared our proposed method with existing SBP methods based on several standard performance measures. The comparison outcomes revealed a significant superiority of our method over the prevailing state-of-the-art SBP methods across most datasets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter