## Metrics to evaluate classification models with R codes: Confusion Matrix, Sensitivity, Specificity, Cohen’s Kappa Value, Mcnemar’s Test

This article aims to discuss various evaluation / performance metrics of machine learning based classification models. There are atleast a minimum of 15 different metrics for evaluating the model. These metrics include classification accuracy, confusion matrix, sensitivity, specificity, detection rate, precision, recall, F1 Score, KS Statistic, ROC Curve, Somers-D Statistic, Cohen’s Kappa Value, Mcnemar’s Test P-Value etc.

Before you actually test or evaluate your classification model, one should find out the minimum accuracy the model should have to be considered for evaluation. This minimum accuracy you can find out by using the concept of NULL MODEL. NULL MODEL implies that you make prediction without a model using the distribution in yes or no in the target / outcome variable in the model.

The most important and easy to use metric for evaluating the performance of your classification model is the classification accuracy.

**Classification accuracy – **This is the percentage of correctly classified observation. Let us your sample size is 1000. The number of correctly predicted classification is 700. Then the classification accuracy becomes 70%.

**Classification metrics with R codes**

Let us say you are handling “test” data and the Target variable is “default_actual” in the data. The predicted Target variable is named as “default_class”

Classification accuracy for test data sample

> classification <- table(test$default_actual,test$default_class)

> classification

With this we get the confusion matrix.

0 1

0 216 39

1 79 68

Let us calculate the classification accuracy of the model. The diagonal elements in the classification matrix has been correctly classified (i.e. 0-0 and 1-1 classification in the confusion matrix). The classification accuracy is calculated as (216 + 68)/ (216 + 68 + 39 + 79) = .7064 = 70.64 %

> accuracy_Test <- sum(diag(classification )) / sum(classification)

> accuracy_Test

[1] 0.7064677

**Misclassification error**

> 1-sum(diag(clasification)/sum(clasification))

[1] 0.2935323

The other way to evaluate the classification models in R is using caret package. Recall the caret package while running the command using the command library()

The command for getting the confusion matrix using caret package is

confusionMatrix()

> library(caret)

Loading required package: lattice

Loading required package: ggplot2

> confusionMatrix(test$default_class, test$default_actual, positive ="1", mode="everything")

`Confusion Matrix and Statistics`

` Prediction`

` 0 1`

` Reference 0 216 79`

` 1 39 68`

`Accuracy : 0.7065`

`95% CI : (0.6593, 0.7506)`

`No Information Rate : 0.6343`

`P-Value [Acc > NIR] : 0.0013778`

`Kappa : 0.3286`

`Mcnemar's Test P-Value : 0.0003304`

`Sensitivity : 0.4626`

`Specificity : 0.8471`

`Pos Pred Value : 0.6355`

`Neg Pred Value : 0.7322`

`Precision : 0.6355`

`Recall : 0.4626`

`F1 : 0.5354`

`Prevalence : 0.3657`

`Detection Rate : 0.1692`

`Detection Prevalence : 0.2662`

`Balanced Accuracy : 0.6548`

`'Positive' Class : 1`

The first thing that you get is the confusion matrix output where the left row data is prediction values and column data is reference or actual values from the test data.

This command also gives you classification accuracy under the head

**1. Accuracy**

The accuracy of the model is 70.65%. This accuracy metric doesn’t tell us what kind of errors we are making while prediction.

The classification error is 29.35%

The breakdown of the confusion matrix looks are given below

Actual | |||

No | Yes | ||

Predicted / Model | No | TRUE NEGATIVE (d) | FALSE NEGATIVE (c) |

Yes | FALSE POSITIVE (b) | TRUE POSITIVE (a) |

**2. TRUE POSITIVE** – Correctly predicted positive outcome. Example d in the matrix. Or the number 68 from the confusion matrix. We correctly predicted that 68 customers will default on loan.

**3. TRUE NEGATIVES** – Correctly predicted negative outcome. Example a in the matrix. We correctly predicted that 216 customers will not default on loan.

**4. FALSE POSITIVE** – Incorrectly predicted to be true. Example c in the matrix. We incorrectly predicted that 39 customers will default on loan.

**5. FALSE NEGATIVES** – incorrectly predicted to be false. Example b in the matrix. We incorrectly predicted that 79 customers will not default on loan.

**6. SENSITIVITY** – It is also called as Recall rate. You also get Recall rate as an output in R. Sensitivity = TP / (TP + FN) = a / (a+ c) = .4626

**7. RECALL = SENSITIVITY** = .4626

**8. SPECIFICITY** = TN / TN + FP = d / (d + b) = .8471

*9. PPV (*Pos Pred Value)* = (sensitivity * prevalence)/((sensitivity*prevalence) + ((1-specificity)*(1-prevalence))) = *6355

*10. NPV (*Neg Pred Value* ) = (specificity * (1-prevalence))/(((1-sensitivity)*prevalence) + ((specificity)*(1-prevalence))) *= 0.7322

**11. Precision*** = A/(A+B) = *0.6355

* 12. F1 = (1+beta^2)*precision*recall/((beta^2 * precision)+recall) = *5354

Where beta = 1 for this function.

**13. Prevalence** = (A+C)/(A+B+C+D) =: 0.3657

**14. Detection Rate** = A/(A+B+C+D) = 1692

**15. Detection Prevalence** = (A+B)/(A+B+C+D) = .2662

* 16. Balanced Accuracy = (sensitivity+specificity)/2 = *6548

**17. Cohen’s Kappa** = (Observed Accuracy – Expected Accuracy) / (1 – Expected Accuracy) = 0.3286

**18. Mcnemar’s Test P-Value** = 0003304

Source: RStudio help