Catalogue Search | MBRL

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

by Chicco, Davide , Jurman, Giuseppe in Accuracy , Algorithms , Analysis

2020

Background To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F 1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F 1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F 1 score in evaluating binary classification tasks by all scientific communities.

Journal Article

Share this book

Add to My Shelf

General Performance Score for classification problems

by Navarro, Jorge , Moguerza, Javier M , Redondo, Ana R in Business metrics , Classification , Machine learning

2022

Several performance metrics are currently available to evaluate the performance of Machine Learning (ML) models in classification problems. ML models are usually assessed using a single measure because it facilitates the comparison between several models. However, there is no silver bullet since each performance metric emphasizes a different aspect of the classification. Thus, the choice depends on the particular requirements and characteristics of the problem. An additional problem arises in multi-class classification problems, since most of the well-known metrics are only directly applicable to binary classification problems. In this paper, we propose the General Performance Score (GPS), a methodological approach to build performance metrics for binary and multi-class classification problems. The basic idea behind GPS is to combine a set of individual metrics, penalising low values in any of them. Thus, users can combine several performance metrics that are relevant in the particular problem based on their preferences obtaining a conservative combination. Different GPS-based performance metrics are compared with alternatives in classification problems using real and simulated datasets. The metrics built using the proposed method improve the stability and explainability of the usual performance metrics. Finally, the GPS brings benefits in both new research lines and practical usage, where performance metrics tailored for each particular problem are considered.

Journal Article

Share this book

Add to My Shelf

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation

by Tötsch, Niklas , Chicco, Davide , Jurman, Giuseppe in Accuracy , Algorithms , Balanced accuracy

2021

Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F 1 score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F 1 score.

Journal Article

Share this book

Add to My Shelf

The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

by Chicco, Davide , Jurman, Giuseppe in Algorithms , Analysis , Area under the curve

2023

Binary classification is a common task for which machine learning and computational statistics are used, and the area under the receiver operating characteristic curve (ROC AUC) has become the common standard metric to evaluate binary classifications in most scientific fields. The ROC curve has true positive rate (also called sensitivity or recall ) on the y axis and false positive rate on the x axis, and the ROC AUC can range from 0 (worst result) to 1 (perfect result). The ROC AUC, however, has several flaws and drawbacks. This score is generated including predictions that obtained insufficient sensitivity and specificity, and moreover it does not say anything about positive predictive value (also known as precision ) nor negative predictive value (NPV) obtained by the classifier, therefore potentially generating inflated overoptimistic results. Since it is common to include ROC AUC alone without precision and negative predictive value, a researcher might erroneously conclude that their classification was successful. Furthermore, a given point in the ROC space does not identify a single confusion matrix nor a group of matrices sharing the same MCC value. Indeed, a given (sensitivity, specificity) pair can cover a broad MCC range, which casts doubts on the reliability of ROC AUC as a performance measure. In contrast, the Matthews correlation coefficient (MCC) generates a high score in its [ - 1 ; + 1 ] interval only if the classifier scored a high value for all the four basic rates of the confusion matrix: sensitivity, specificity, precision, and negative predictive value. A high MCC (for example, MCC = 0.9), moreover, always corresponds to a high ROC AUC, and not vice versa. In this short study, we explain why the Matthews correlation coefficient should replace the ROC AUC as standard statistic in all the scientific studies involving a binary classification, in all scientific fields.

Journal Article

Share this book

Add to My Shelf

Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Confirmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classification Using Artificial Intelligence and Regression Analysis

by Piro, Patrizia , Pirouz, Behrouz , Shaffiee Haghshenas, Sami in Artificial intelligence , Coronaviruses , COVID-19

2020

Nowadays, sustainable development is considered a key concept and solution in creating a promising and prosperous future for human societies. Nevertheless, there are some predicted and unpredicted problems that epidemic diseases are real and complex problems. Hence, in this research work, a serious challenge in the sustainable development process was investigated using the classification of confirmed cases of COVID-19 (new version of Coronavirus) as one of the epidemic diseases. Hence, binary classification modeling was used by the group method of data handling (GMDH) type of neural network as one of the artificial intelligence methods. For this purpose, the Hubei province in China was selected as a case study to construct the proposed model, and some important factors, namely maximum, minimum, and average daily temperature, the density of a city, relative humidity, and wind speed, were considered as the input dataset, and the number of confirmed cases was selected as the output dataset for 30 days. The proposed binary classification model provides higher performance capacity in predicting the confirmed cases. In addition, regression analysis has been done and the trend of confirmed cases compared with the fluctuations of daily weather parameters (wind, humidity, and average temperature). The results demonstrated that the relative humidity and maximum daily temperature had the highest impact on the confirmed cases. The relative humidity in the main case study, with an average of 77.9%, affected positively, and maximum daily temperature, with an average of 15.4 °C, affected negatively, the confirmed cases.

Journal Article

Share this book

Add to My Shelf

Early-Stage Alzheimer’s Disease Categorization Using PET Neuroimaging Modality and Convolutional Neural Networks in the 2D and 3D Domains

by Rehman, Ateeq Ur , Othman, Mohamed Tahar Ben , Shafiq, Muhammad in Alzheimer's disease , binary classification , Biomarkers

2022

Alzheimer’s Disease (AD) is a health apprehension of significant proportions that is negatively impacting the ageing population globally. It is characterized by neuronal loss and the formation of structures such as neurofibrillary tangles and amyloid plaques in the early as well as later stages of the disease. Neuroimaging modalities are routinely used in clinical practice to capture brain alterations associated with AD. On the other hand, deep learning methods are routinely used to recognize patterns in underlying data distributions effectively. This work uses Convolutional Neural Network (CNN) architectures in both 2D and 3D domains to classify the initial stages of AD into AD, Mild Cognitive Impairment (MCI) and Normal Control (NC) classes using the positron emission tomography neuroimaging modality deploying data augmentation in a random zoomed in/out scheme. We used novel concepts such as the blurring before subsampling principle and distant domain transfer learning to build 2D CNN architectures. We performed three binaries, that is, AD/NC, AD/MCI, MCI/NC and one multiclass classification task AD/NC/MCI. The statistical comparison revealed that 3D-CNN architecture performed the best achieving an accuracy of 89.21% on AD/NC, 71.70% on AD/MCI, 62.25% on NC/MCI and 59.73% on AD/NC/MCI classification tasks using a five-fold cross-validation hyperparameter selection approach. Data augmentation helps in achieving superior performance on the multiclass classification task. The obtained results support the application of deep learning models towards early recognition of AD.

Journal Article

Share this book

Add to My Shelf

Interpretable classification models for recidivism prediction

by Rudin, Cynthia , Ustun, Berk , Zeng, Jiaming in Binary classification , Interpretability , Machine learning

2017

We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent and interpretable to use for decision making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation to allocating preventative social services. Each case might have an objective other than classification accuracy, such as a desired true positive rate TPR or false positive rate FPR. Each (TPR, FPR) pair is a point on the receiver operator characteristic (ROC) curve. We use popular machine learning methods to create models along the full ROC curve on a wide range of recidivism prediction problems. We show that many methods (support vector machines, stochastic gradient boosting and ridge regression) produce equally accurate models along the full ROC curve. However, methods that are designed for interpretability (classification and regression trees and C5.0) cannot be tuned to produce models that are accurate and/or interpretable. To handle this shortcoming, we use a recent method called supersparse linear integer models to produce accurate, transparent and interprétable scoring systems along the full ROC curve. These scoring systems can be used for decision making for many different use cases, since they are just as accurate as the most powerful black box machine learning models for many applications, but completely transparent, and highly interpretable.

Journal Article

Share this book

Add to My Shelf

A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain

by Tkachenko, Roman , Shakhovska, Nataliya , Ilchyshyn, Bohdan in Accuracy , Adequacy , Algorithms

2022

Data normalization is a data preprocessing task and one of the first to be performed during intellectual analysis, particularly in the case of tabular data. The importance of its implementation is determined by the need to reduce the sensitivity of the artificial intelligence model to the values of the features in the dataset to increase the studied model’s adequacy. This paper focuses on the problem of effectively preprocessing data to improve the accuracy of intellectual analysis in the case of performing medical diagnostic tasks. We developed a new two-step method for data normalization of numerical medical datasets. It is based on the possibility of considering both the interdependencies between the features of each observation from the dataset and their absolute values to improve the accuracy when performing medical data mining tasks. We describe and substantiate each step of the algorithmic implementation of the method. We also visualize the results of the proposed method. The proposed method was modeled using six different machine learning methods based on decision trees when performing binary and multiclass classification tasks. We used six real-world, freely available medical datasets with different numbers of vectors, attributes, and classes to conduct experiments. A comparison between the effectiveness of the developed method and that of five existing data normalization methods was carried out. It was experimentally established that the developed method increases the accuracy of the Decision Tree and Extra Trees Classifier by 1–5% in the case of performing the binary classification task and the accuracy of the Bagging, Decision Tree, and Extra Trees Classifier by 1–6% in the case of performing the multiclass classification task. Increasing the accuracy of these classifiers only by using the new data normalization method satisfies all the prerequisites for its application in practice when performing various medical data mining tasks.

Journal Article

Share this book

Add to My Shelf

Training and assessing classification rules with imbalanced data

by Menardi, Giovanna , Torelli, Nicola in Accuracy , Artificial Intelligence , Chemistry and Earth Sciences

2014

The problem of modeling binary responses by using cross-sectional data has been addressed with a number of satisfying solutions that draw on both parametric and nonparametric methods. However, there exist many real situations where one of the two responses (usually the most interesting for the analysis) is rare. It has been largely reported that this class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events. However, not only the estimation of the classification model is affected by a skewed distribution of the classes, but also the evaluation of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the model’s accuracy. In this work, the effects of class imbalance on model training and model assessing are discussed. Moreover, a unified and systematic framework for dealing with the problem of imbalanced classification is proposed, based on a smoothed bootstrap re-sampling technique. The proposed technique is founded on a sound theoretical basis and an extensive empirical study shows that it outperforms the main other remedies to face imbalanced learning problems.

Journal Article

Share this book

Add to My Shelf

A Genetic Programming Approach to Binary Classification Problem

by Santoso, Leo Willyanto , Karrar Hameed Kadhim , S. Suman Rajest in Artificial intelligence , Artificial neural networks , Classification

2021

The Binary classification is the most challenging problem in machine learning. One of the most promising technique to solve this problem is by implementing genetic programming (GP). GP is one of Evolutionary Algorithm (EA) that used to solve problems that humans do not know how to solve it directly. The objectives of this research is to demonstrate the use of genetic programming in this type of problems; that is, other types of techniques are typically used, e.g., regression, artificial neural networks. Genetic programming presents an advantage compared to those techniques, which is that it does not need an a priori definition of its structure. The algorithm evolves automatically until finding a model that best fits a set of training data. Feature engineering was considered to improve the accuracy. In this research, feature transformation and feature creation were implemented. Thus, genetic programming can be considered as an alternative option for the development of intelligent systems mainly in the pattern recognition field.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter