Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
54,108
result(s) for
"Correlation coefficient"
Sort by:
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
2020
Background
To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F
1
score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.
Results
The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.
Conclusions
In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F
1
score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F
1
score in evaluating binary classification tasks by all scientific communities.
Journal Article
Improving the reliability of measurements in orthopaedics and sports medicine
by
Karlsson, Jon
,
Mouton, Caroline
,
Królikowska, Aleksandra
in
agreement
,
Clinical trials
,
Correlation coefficient
2023
A large space still exists for improving the measurements used in orthopaedics and sports medicine, especially as we face rapid technological progress in devices used for diagnostic or patient monitoring purposes. For a specific measure to be valuable and applicable in clinical practice, its reliability must be established. Reliability refers to the extent to which measurements can be replicated, and three types of reliability can be distinguished: inter-rater, intra-rater, and test–retest. The present article aims to provide insights into reliability as one of the most important and relevant properties of measurement tools. It covers essential knowledge about the methods used in orthopaedics and sports medicine for reliability studies. From design to interpretation, this article guides readers through the reliability study process. It addresses crucial issues such as the number of raters needed, sample size calculation, and breaks between particular trials. Different statistical methods and tests are presented for determining reliability depending on the type of gathered data, with particular attention to the commonly used intraclass correlation coefficient.
Journal Article
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
2021
Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F
1
score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F
1
score.
Journal Article
Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data
by
Kossowski, Tomasz
,
Hauke, Jan
in
Correlation coefficient
,
Economic development
,
Frequency distribution
2011
Spearman's rank correlation coefficient is a nonparametric (distribution-free) rank statistic proposed by Charles Spearman as a measure of the strength of an association between two variables. It is a measure of a monotone association that is used when the distribution of data makes Pearson's correlation coefficient undesirable or misleading. Spearman's coefficient is not a measure of the linear relationship between two variables, as some \"statisticians\" declare. It assesses how well an arbitrary monotonic function can describe a relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike Pearson's product-moment correlation coefficient, it does not require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for variables measured at the ordinal level. The idea of the paper is to compare the values of Pearson's product-moment correlation coefficient and Spearman's rank correlation coefficient as well as their statistical significance for different sets of data (original - for Pearson's coefficient, and ranked data for Spearman's coefficient) describing regional indices of socio-economic development.
Journal Article
The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification
2023
Binary classification is a common task for which machine learning and computational statistics are used, and the area under the receiver operating characteristic curve (ROC AUC) has become the common standard metric to evaluate binary classifications in most scientific fields. The ROC curve has
true positive rate
(also called
sensitivity
or
recall
) on the
y
axis and false positive rate on the
x
axis, and the ROC AUC can range from 0 (worst result) to 1 (perfect result). The ROC AUC, however, has several flaws and drawbacks. This score is generated including predictions that obtained insufficient sensitivity and specificity, and moreover it does not say anything about
positive predictive value
(also known as
precision
) nor negative predictive value (NPV) obtained by the classifier, therefore potentially generating inflated overoptimistic results. Since it is common to include ROC AUC alone without precision and negative predictive value, a researcher might erroneously conclude that their classification was successful. Furthermore, a given point in the ROC space does not identify a single confusion matrix nor a group of matrices sharing the same MCC value. Indeed, a given
(sensitivity, specificity)
pair can cover a broad MCC range, which casts doubts on the reliability of ROC AUC as a performance measure. In contrast, the Matthews correlation coefficient (MCC) generates a high score in its
[
-
1
;
+
1
]
interval only if the classifier scored a high value for all the four
basic rates
of the confusion matrix: sensitivity, specificity, precision, and negative predictive value. A high MCC (for example, MCC
=
0.9), moreover, always corresponds to a high ROC AUC, and not vice versa. In this short study, we explain why the Matthews correlation coefficient should replace the ROC AUC as standard statistic in all the scientific studies involving a binary classification, in all scientific fields.
Journal Article
Research on Intrusion Detection Method Based on Pearson Correlation Coefficient Feature Selection Algorithm
by
Wu, Chunwang
,
Chen, Pengtian
,
Li, Fei
in
Algorithms
,
Correlation coefficients
,
Feature Selection
2021
The current era is the era of big data and 5G. The network security data in the network is different from the past, and the network security data is growing exponentially. As an important line of defense for network security, intrusion detection technology can efficiently detect and process massive amounts of security data has become an important factor restricting its development. The feature selection method of intrusion detection data directly affects the efficiency of intrusion detection. Therefore, this paper proposes a feature selection algorithm based on pearson correlation coefficient, which performs feature specification on many features, which greatly reduces the amount of security data that needs to be processed, and effectively reduces the dimensionality of the data to increase the intrusion. Detection efficiency.
Journal Article
Biostatistics series module 6: Correlation and linear regression
2016
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson′s correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman′s rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Journal Article
Correction: Identifying the factors associated with cesarean section modeled with categorical correlation coefficients in partial least squares
[This corrects the article DOI: 10.1371/journal.pone.0219427.].
Journal Article
Is there a correlation between abundance and environmental suitability derived from ecological niche modelling? A meta-analysis
by
Marcelo M. Weber
,
Richard D. Stevens
,
José Alexandre F. Diniz-Filho
in
Abundance
,
Climate change
,
climatic factors
2017
It is thought that species abundance is correlated with environmental suitability and that environmental variables, scale, and type of model fitting can confound this relationship. We performed a meta-analysis to 1) test whether species abundance is positively correlated with environmental suitability derived from correlative ecological niche models (ENM), 2) test whether studies encompassing large areas within a species range (> 50%) exhibited higher AS correlations than studies encompassing small areas within a species range (< 50%), 3) assess which modelling method provided higher AS correlation, and 4) compare strength of the AS relationship between studies using only climatic variables and those that used both climatic and other environmental variables to derive suitability. We used correlation coefficients to measure the relationship between abundance and environmental suitability derived from ENM. Each correlation coefficient was considered an effect size in a random-effects multivariate meta-analysis. In all cases we found a significantly positive relationship between abundance and suitability. This relationship was consistent regardless of scale of study, ENM method, or set of variables used to derive suitability. There was no difference in strength of correlation between studies focusing on large or small areas within a species’ range or among ENM methods. Studies using other variables in combination with climate exhibited higher AS correlations than studies using only climatic variables. We conclude that occurrence data can be a reasonable proxy for abundance, especially for vertebrates, and the use of local variables increases the strength of the AS relationship. Use of ENMs can significantly decrease survey costs and allow the study of large-scale abundance patterns using less information. Including only climatic variables in ENM may confound the relationship between abundance and suitability when compared to studies including variables taken locally. However, modelers and conservationists must be aware that high environmental suitability does not always indicate high abundance.
Journal Article
Why Cohen’s Kappa should be avoided as performance measure in classification
by
Tibau, Xavier-Andoni
,
Delgado, Rosario
in
Accuracy
,
Artificial intelligence
,
Biology and Life Sciences
2019
We show that Cohen's Kappa and Matthews Correlation Coefficient (MCC), both extended and contrasted measures of performance in multi-class classification, are correlated in most situations, albeit can differ in others. Indeed, although in the symmetric case both match, we consider different unbalanced situations in which Kappa exhibits an undesired behaviour, i.e. a worse classifier gets higher Kappa score, differing qualitatively from that of MCC. The debate about the incoherence in the behaviour of Kappa revolves around the convenience, or not, of using a relative metric, which makes the interpretation of its values difficult. We extend these concerns by showing that its pitfalls can go even further. Through experimentation, we present a novel approach to this topic. We carry on a comprehensive study that identifies an scenario in which the contradictory behaviour among MCC and Kappa emerges. Specifically, we find out that when there is a decrease to zero of the entropy of the elements out of the diagonal of the confusion matrix associated to a classifier, the discrepancy between Kappa and MCC rise, pointing to an anomalous performance of the former. We believe that this finding disables Kappa to be used in general as a performance measure to compare classifiers.
Journal Article