Catalogue Search | MBRL

On combining system and machine learning performance tuning for distributed data stream applications

by Herodotou, Herodotos , Odysseos, Lambros in Accuracy , Algorithms , Collaboration

2023

The growing need to identify patterns in data and automate decisions based on them in near-real time, has stimulated the development of new machine learning (ML) applications processing continuous data streams. However, the deployment of ML applications over distributed stream processing engines (DSPEs) such as Apache Spark Streaming is a complex procedure that requires extensive tuning along two dimensions. First, DSPEs have a plethora of system configuration parameters, like degree of parallelism, memory buffer sizes, etc., that have a direct impact on application throughput and/or latency, and need to be optimized. Second, ML models have their own set of hyperparameters that require tuning as they can affect the overall prediction accuracy of the trained model significantly. These two forms of tuning have been studied extensively in the literature but only in isolation from each other. This manuscript presents a comprehensive experimental study that combines system configuration and hyperparameter tuning of ML applications over DSPEs. The experimental results reveal unexpected and complex interactions between the choices of system configurations and hyperparameters, and their impact on both application and model performance. These insights motivate the need for new combined system and ML model tuning approaches, and open up new research directions in the field of self-managing distributed stream processing systems.

Journal Article

Share this book

Add to My Shelf

Hypothesis testing in finite mixture of regressions

by VIDYASHANKAR, Anand N. , KHALILI, Abbas in Adjusted p‐value , Asymptotic methods , BIC‐enhanced tuning parameter

2018

Sparse finite mixture of regression models arise in several scientific applications and testing hypotheses concerning regression coefficients in such models is fundamental to data analysis. In this article, we describe an approach for hypothesis testing of regression coefficients that take into account model selection uncertainty. The proposed methods involve (i) estimating the active predictor set of the sparse model using a consistent model selector and (ii) testing hypotheses concerning the regression coefficients associated with the estimated active predictor set. The methods asymptotically control the family wise error rate at a pre-specified nominal level, while accounting for variable selection uncertainty. Additionally, we provide examples of consistent model selectors and describe methods for finite sample improvements. Performance of the methods is also illustrated using simulations. A real data analysis is included to illustrate the applicability of the methods. Les mélanges épars de modèles de régression surviennent dans plusieurs applications scientifiques, et il est fondamental pour l’analyse des données de tester des hypothèses à propos des coefficients de ces modèles. Les auteurs décrivent une approche pour tester les coefficients de régression en tenant compte de l’incertitude liée à la sélection de modèle. Les méthodes proposées comportent (i) l’estimation de l’ensemble des prédicteurs actifs d’un modèle épars à l’aide d’un sélecteur de modèle convergent, et (ii) le test d’hypothèses à propos des coefficients de régression associés à l’ensemble de prédicteurs actifs. Les méthodes contrôlent asymptotiquement le taux d’erreur par famille à un niveau prédéfini tout en tenant compte de l’incertitude issue de la sélection de variables. Les auteurs fournissent des exemples de sélecteurs de modèle convergents et décrivent des méthodes apportant des améliorations avec des échantillons finis. Ils illustrent la performance de leurs méthodes à l’aide de simulations et procèdent à l’analyse de données réelles pour montrer leur applicabilité.

Journal Article

Share this book

Add to My Shelf

An Overview of Variants and Advancements of PSO Algorithm

by Singh, Narinder , Jain, Meetu , Saihjpal, Vibha in advances in particle swarm optimization , Chemistry , Exploitation

2022

Particle swarm optimization (PSO) is one of the most famous swarm-based optimization techniques inspired by nature. Due to its properties of flexibility and easy implementation, there is an enormous increase in the popularity of this nature-inspired technique. Particle swarm optimization (PSO) has gained prompt attention from every field of researchers. Since its origin in 1995 till now, researchers have improved the original Particle swarm optimization (PSO) in varying ways. They have derived new versions of it, such as the published theoretical studies on various parameters of PSO, proposed many variants of the algorithm and numerous other advances. In the present paper, an overview of the PSO algorithm is presented. On the one hand, the basic concepts and parameters of PSO are explained, on the other hand, various advances in relation to PSO, including its modifications, extensions, hybridization, theoretical analysis, are included.

Journal Article

Share this book

Add to My Shelf

Investigating the Effects of Parameter Tuning on Machine Learning for Occupant Behavior Analysis in Japanese Residen-tial Buildings

by Takashi Nakaya , Kaito Furuhashi in Building construction , feature selection , heating

2023

Journal Article

Share this book

Add to My Shelf

A random forest guided tour

by Scornet, Erwan , Biau, Gérard in Aggregates , Algorithms , Classification

2016

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

Journal Article

Share this book

Add to My Shelf

Comparison of Random Forest and Support Vector Machine Classifiers for Regional Land Cover Mapping Using Coarse Resolution FY-3C Images

by Fan, Jinlong , Xu, Wenbo , Adugna, Tesfaye in Africa , Algorithms , Classification

2022

The type of algorithm employed to classify remote sensing imageries plays a great role in affecting the accuracy. In recent decades, machine learning (ML) has received great attention due to its robustness in remote sensing image classification. In this regard, random forest (RF) and support vector machine (SVM) are two of the most widely used ML algorithms to generate land cover (LC) maps from satellite imageries. Although several comparisons have been conducted between these two algorithms, the findings are contradicting. Moreover, the comparisons were made on local-scale LC map generation either from high or medium resolution images using various software, but not Python. In this paper, we compared the performance of these two algorithms for large area LC mapping of parts of Africa using coarse resolution imageries in the Python platform by the employing Scikit-Learn (sklearn) library. We employed a big dataset, 297 metrics, comprised of systematically selected 9-month composite FegnYun-3C (FY-3C) satellite images with 1 km resolution. Several experiments were performed using a range of values to determine the best values for the two most important parameters of each classifier, the number of trees and the number of variables, for RF, and penalty value and gamma for SVM, and to obtain the best model of each algorithm. Our results showed that RF outperformed SVM yielding 0.86 (OA) and 0.83 (k), which are 1–2% and 3% higher than the best SVM model, respectively. In addition, RF performed better in mixed class classification; however, it performed almost the same when classifying relatively pure classes with distinct spectral variation, i.e., consisting of less mixed pixels. Furthermore, RF is more efficient in handling large input datasets where the SVM fails. Hence, RF is a more robust ML algorithm especially for heterogeneous large area mapping using coarse resolution images. Finally, default parameter values in the sklearn library work well for satellite image classification with minor/or no adjustment for these algorithms.

Journal Article

Share this book

Add to My Shelf

Review and Comparison of Genetic Algorithm and Particle Swarm Optimization in the Optimal Power Flow Problem

by Biskas, Pandelis , Papazoglou, Georgios in Best practice , Electric power systems , Electricity

2023

Metaheuristic optimization techniques have successfully been used to solve the Optimal Power Flow (OPF) problem, addressing the shortcomings of mathematical optimization techniques. Two of the most popular metaheuristics are the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The literature surrounding GA and PSO OPF is vast and not adequately organized. This work filled this gap by reviewing the most prominent works and analyzing the different traits of GA OPF works along seven axes, and of PSO OPF along four axes. Subsequently, cross-comparison between GA and PSO OPF works was undertaken, using the reported results of the reviewed works that use the IEEE 30-bus network to assess the performance and accuracy of each method. Where possible, the practices used in GA and PSO OPF were compared with literature suggestions from other domains. The cross-comparison aimed to act as a first step towards the standardization of GA and PSO OPF, as it can be used to draw preliminary conclusions regarding the tuning of hyper-parameters of GA and PSO OPF. The analysis of the cross-comparison results indicated that works using both GA and PSO OPF offer remarkable accuracy (with GA OPF having a slight edge) and that PSO OPF involves less computational burden.

Journal Article

Share this book

Add to My Shelf

Deep learning for effective Android malware detection using API call graph embeddings

by Acarman, Tankut , Pektaş, Abdurrahman in Accuracy , Algorithms , Application programming interface

2020

High penetration of Android applications along with their malicious variants requires efficient and effective malware detection methods to build mobile platform security. API call sequence derived from API call graph structure can be used to model application behavior accurately. Behaviors are extracted by following the API call graph, its branching, and order of calls. But identification of similarities in graphs and graph matching algorithms for classification is slow, complicated to be adopted to a new domain, and their results may be inaccurate. In this study, the authors use the API call graph as a graph representation of all possible execution paths that a malware can track during its runtime. The embedding of API call graphs transformed into a low dimension numeric vector feature set is introduced to the deep neural network. Then, similarity detection for each binary function is trained and tested effectively. This study is also focused on maximizing the performance of the network by evaluating different embedding algorithms and tuning various network configuration parameters to assure the best combination of the hyper-parameters and to reach at the highest statistical metric value. Experimental results show that the presented malware classification is reached at 98.86% level in accuracy, 98.65% in F -measure, 98.47% in recall and 98.84% in precision, respectively.

Journal Article

Share this book

Add to My Shelf

Tuning parameter selection in high dimensional penalized likelihood

by Fan, Yingying , Tang, Cheng Yong in Asymptotic methods , Asymptotic properties , Candidates

2013

Determining how to select the tuning parameter appropriately is essential in penalized likelihood methods for high dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in the generlized information criterion. We find that this model complexity penalty should diverge at the rate of some power of log(p) depending on the tail probability behaviour of the response variables. This reveals that using the Akaike information criterion or Bayes information criterion to select the tuning parameter may not be adequate for consistently identifying the true model. On the basis of our theoretical study, we propose a uniform choice of the model complexity penalty and show that the approach proposed consistently identifies the true model among candidate models with asymptotic probability 1. We justify the performance of the procedure proposed by numerical simulations and a gene expression data analysis.

Journal Article

Share this book

Add to My Shelf

Performance Optimization of Vehicle-to-vehicle Communication through Reactive Routing Protocol Analysis

by Syahputra, Ade , Priyambodo, Tri Kuntoro , Wicaksono, Kunto

2025

The study focuses on improving the Quality of Service (QoS) in Vehicle-to-Vehicle (V2V) communication within Vehicular Ad Hoc Networks (VANETs) by enhancing the Learning Automata-based Ad Hoc On-Demand Distance Vector (LA-AODV) routing protocol. Unlike the standard AODV, which is a reactive routing protocol, and previous configurations of LA-AODV, this research introduces a fine-tuning strategy for the learning automata parameters. This strategy allows the parameters to dynamically adapt to changing network conditions to reduce routing overhead and enhance transmission stability. Three modified versions of LA-AODV referred to as setups A, B, and C, are evaluated against the standard AODV and earlier LA-AODV configurations. The performance of each setup is measured using key QoS metrics: flood ID, packet loss ratio (PLR), packet delivery ratio (PDR), average throughput, end-to-end delay, and jitter. These metrics are crucial in evaluating the efficiency, reliability, and performance of V2V communication systems within VANETs. The results demonstrate that the LA-AODV variants significantly reduce flood ID counts, which represent the number of times a packet is broadcasted, compared to AODV, with setups A and B achieving reductions of 10.24% and 28.74%, respectively, at 200 transmissions, indicating enhanced scalability. Additionally, LA-AODV setup A provides 5.4% higher throughput in high-density scenarios. The modified versions also significantly decrease delay and jitter, achieving reductions of over 99.99% and 99.93%, respectively, at 50 transmissions. These findings underscore the adaptive capabilities of the proposed LA-AODV modifications, providing reassurance about the robustness of the system. They also highlight the importance of parameter optimization in maintaining reliable V2V communication. Future work will benchmark LA-AODV against other state-of-the-art protocols to validate its effectiveness further.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter