Catalogue Search | MBRL

The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation

by Chicco, Davide , Jurman, Giuseppe , Warrens, Matthijs J. in Analysis , Artificial Intelligence , Business metrics

2021

Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R -squared or R 2 ) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R 2 and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination ( R -squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R -squared as standard metric to evaluate regression analyses in any scientific domain.

Journal Article

Share this book

Add to My Shelf

Extreme-value-theoretic estimation of local intrinsic dimensionality

by Chelly, Oussama , Amsaleg, Laurent , Furon, Teddy in Continuity (mathematics) , Data analysis , Data mining

2018

This paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. The local model can be regarded as an extension of Karger and Ruhl’s expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. This form of intrinsic dimensionality can be particularly useful in search, classification, outlier detection, and other contexts in machine learning, databases, and data mining, as it has been shown to be equivalent to a measure of the discriminative power of similarity functions. Several estimators of local ID are proposed and analyzed based on extreme value theory, using maximum likelihood estimation, the method of moments, probability weighted moments, and regularly varying functions. An experimental evaluation is also provided, using both real and artificial data.

Journal Article

Share this book

Add to My Shelf

Automated discovery of fundamental variables hidden in experimental data

by Chen, Boyuan , Huang, Kuang , Raghupathi, Sunand in Artificial intelligence , Automation , Dynamical systems

2022

All physical laws are described as mathematical relationships between state variables. These variables give a complete and non-redundant description of the relevant system. However, despite the prevalence of computing power and artificial intelligence, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modelling physical phenomena still rely on the assumption that the relevant state variables are already known. A longstanding question is whether it is possible to identify state variables from only high-dimensional observational data. Here we propose a principle for determining how many state variables an observed system is likely to have, and what these variables might be. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables.

Journal Article

Share this book

Add to My Shelf

CatBoost for big data: an interdisciplinary review

by Hancock, John T. , Khoshgoftaar, Taghi M. in Algorithms , Best practice , Big Data

2020

Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

Journal Article

Share this book

Add to My Shelf

A novel chaotic image encryption algorithm based on improved baker map and logistic map

by Lai, Wenrui , Luo, Yuqin , Yu, Jin in Algorithms , Chaos theory , Data encryption

2019

A novel image encryption algorithm based on double chaotic systems is proposed in this paper. On account of the limited chaotic range and vulnerability of a single chaotic map, we use the two-dimensional Baker chaotic map to control the system parameters and the state variable of the logistic chaotic map. After control, the parameter of the logistic map is varying, and the generated logistic sequence is non-stationary. The improved map has been proven to be random and unpredictable by complexity analysis. Furthermore, a novel image encryption algorithm, including shuffling and substituting processes, is proposed based on the improved chaotic maps. Many statistical tests and security analysis indicate that this algorithm has an excellent security performance, and it can be competitive with some other recently proposed image encryption algorithms.

Journal Article

Share this book

Add to My Shelf

Particle swarm optimization algorithm: an overview

by Wang, Dongshu , Liu, Lei , Tan, Dapei in Algorithms , Artificial Intelligence , Birds

2018

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm motivated by intelligent collective behavior of some animals such as flocks of birds or schools of fish. Since presented in 1995, it has experienced a multitude of enhancements. As researchers have learned about the technique, they derived new versions aiming to different demands, developed new applications in a host of areas, published theoretical studies of the effects of the various parameters and proposed many variants of the algorithm. This paper introduces its origin and background and carries out the theory analysis of the PSO. Then, we analyze its present situation of research and application in algorithm structure, parameter selection, topology structure, discrete PSO algorithm and parallel PSO algorithm, multi-objective optimization PSO and its engineering applications. Finally, the existing problems are analyzed and future research directions are presented.

Journal Article

Share this book

Add to My Shelf

Innovative modeling techniques including MEP, ANN and FQ to forecast the compressive strength of geopolymer concrete modified with nanoparticles

by Qaidi, Shaker M. A. , Mohammed, Ahmed S. , Faraj, Rabar H. in Artificial Intelligence , Artificial neural networks , Cement

2023

The use of nano-materials to improve the engineering properties of different types of concrete composites including geopolymer concrete (GPC) has recently gained popularity. Numerous programs have been executed to investigate the mechanical properties of GPC. In general, compressive strength (CS) is an essential mechanical indicator for judging the quality of concrete. Traditional test methods for determining the CS of GPC are expensive, time-consuming and limiting due to the complicated interplay of a wide variety of mixing proportions and curing regimes. Therefore, in this study, artificial neural network (ANN), multi-expression programming, full quadratic, linear regression and M5P-tree machine learning techniques were used to predict the CS of GPC. In this instance, around 207 tested CS values were extracted from the literature and studied to promote the models. During the process of modeling, eleven effective variables were utilized as input model parameters, and one variable was utilized as an output. Four statistical indicators were used to judge how well the models worked, and the sensitivity analysis was carried out. According to the results, the ANN model calculated the CS of GPC with greater precision than the other models. On the other hand, the ratio of alkaline solution to the binder, molarity, NaOH content, curing temperature and concrete age have substantial effects on the CS of GPC.

Journal Article

Share this book

Add to My Shelf

Understanding the Design Elements Affecting User Acceptance of Intelligent Agents: Past, Present and Future

by Elshan, Edona , Janson, Andreas , Leimeister, Jan Marco in Acceptance , Computers , Dependent variables

2022

Intelligent agents (IAs) are permeating both business and society. However, interacting with IAs poses challenges moving beyond technological limitations towards the human-computer interface. Thus, the knowledgebase related to interaction with IAs has grown exponentially but remains segregated and impedes the advancement of the field. Therefore, we conduct a systematic literature review to integrate empirical knowledge on user interaction with IAs. This is the first paper to examine 107 Information Systems and Human-Computer Interaction papers and identified 389 relationships between design elements and user acceptance of IAs. Along the independent and dependent variables of these relationships, we span a research space model encompassing empirical research on designing for IA user acceptance. Further we contribute to theory, by presenting a research agenda along the dimensions of the research space, which shall be useful to both researchers and practitioners. This complements the past and present knowledge on designing for IA user acceptance with potential pathways into the future of IAs.

Journal Article

Share this book

Add to My Shelf

Graph Theoretic Methods in Multiagent Networks

by Mesbahi, Mehran , Egerstedt, Magnus in Abstraction (software engineering) , Adjacency matrix , Algebraic connectivity

2010

This accessible book provides an introduction to the analysis and design of dynamic multiagent networks. Such networks are of great interest in a wide range of areas in science and engineering, including: mobile sensor networks, distributed robotics such as formation flying and swarming, quantum networks, networked economics, biological synchronization, and social networks. Focusing on graph theoretic methods for the analysis and synthesis of dynamic multiagent networks, the book presents a powerful new formalism and set of tools for networked systems. The book's three sections look at foundations, multiagent networks, and networks as systems. The authors give an overview of important ideas from graph theory, followed by a detailed account of the agreement protocol and its various extensions, including the behavior of the protocol over undirected, directed, switching, and random networks. They cover topics such as formation control, coverage, distributed estimation, social networks, and games over networks. And they explore intriguing aspects of viewing networks as systems, by making these networks amenable to control-theoretic analysis and automatic synthesis, by monitoring their dynamic evolution, and by examining higher-order interaction models in terms of simplicial complexes and their applications. The book will interest graduate students working in systems and control, as well as in computer science and robotics. It will be a standard reference for researchers seeking a self-contained account of system-theoretic aspects of multiagent networks and their wide-ranging applications. This book has been adopted as a textbook at the following universities: University of Stuttgart, GermanyRoyal Institute of Technology, SwedenJohannes Kepler University, AustriaGeorgia Tech, USAUniversity of Washington, USAOhio University, USA

eBook

Share this book

Add to My Shelf

Research on water temperature prediction based on improved support vector regression

by Hao, Zou , Xifeng, Huang , Quan, Quan in Artificial Intelligence , Computational Biology/Bioinformatics , Computational Science and Engineering

2022

This paper presents a model for predicting the water temperature of the reservoir incorporating with solar radiation to analyze and evaluate the water temperature of large high-altitude reservoirs in western China. Through mutual information inspection, the model shows that the dependent variable has a good correlation with water temperature, and it is added to the sample feature training model. Then, the measured water temperature data in the reservoir for many years are used to establish the support vector regression (SVR) model, and genetic algorithm (GA) is introduced to optimize the parameters, so as to construct an improved support vector machine (M-GASVR). At the same time, root-mean-square error, mean absolute error, mean absolute percentage error, and Nash–Sutcliffe efficiency coefficient are used as the criteria for evaluating the performance of SVR model, ANN model, GA-SVR model, and M-GASVR model. In addition, the M-GASVR model is used to simulate the water temperature of the reservoir under different working conditions. The results show that ANN model is the worst among the four models, while GA-SVR model is better than SVR model in terms of metric, and M-GASVR model is the best. For non-stationary sequences, the prediction model M-GASVR can well predict the vertical water temperature and water temperature structure in the reservoir area. This study provides useful insights into the prediction of vertical water temperature at different depths of reservoirs.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter