Metrics to evaluate classification models with R codes: Confusion Matrix, Sensitivity, Specificity, Cohen’s Kappa Value, Mcnemar’s Test

This article aims to discuss various evaluation / performance metrics of machine learning based classification models. There are atleast a minimum of 15 different metrics for evaluating the model. These metrics include classification accuracy, confusion matrix, sensitivity, specificity, detection rate, precision, recall, F1 Score, KS Statistic, ROC Curve, Somers-D Statistic, Cohen’s Kappa Value, Mcnemar’s Test P-Value etc.

Before you actually test or evaluate your classification model, one should find out the minimum accuracy the model should have to be considered for evaluation. This minimum accuracy you can find out by using the concept of NULL MODEL. NULL MODEL implies that you make prediction without a model using the distribution in yes or no in the target / outcome variable in the model.

The most important and easy to use metric for evaluating the performance of your classification model is the classification accuracy.

Classification accuracy – This is the percentage of correctly classified observation. Let us your sample size is 1000. The number of correctly predicted classification is 700. Then the classification accuracy becomes  70%.

Classification metrics with R codes

Let us say you are handling “test” data and the Target variable is “default_actual” in the data. The predicted Target variable is named as “default_class”

Classification accuracy for test data sample

> classification <- table(test$default_actual,test$default_class)
> classification         

With this we get the confusion matrix.

    0     1
0 216   39
1 79     68

Let us calculate the classification accuracy of the model. The diagonal elements in the classification matrix has been correctly classified (i.e. 0-0 and 1-1 classification in the confusion matrix).  The classification accuracy is calculated as (216 + 68)/ (216 + 68 + 39 + 79) = .7064 = 70.64 %

> accuracy_Test <- sum(diag(classification )) / sum(classification)
> accuracy_Test
[1] 0.7064677

Misclassification error

> 1-sum(diag(clasification)/sum(clasification))
[1] 0.2935323   

The other way to evaluate the classification models in R is using caret package. Recall the caret package while running the command using the command library()

The command for getting the confusion matrix using caret package is  

confusionMatrix()
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> confusionMatrix(test$default_class, test$default_actual, positive ="1", mode="everything")

Confusion Matrix and Statistics

                          Prediction
                            0   1
               Reference 0 216  79
                         1 39   68

Accuracy : 0.7065
95% CI : (0.6593, 0.7506)
No Information Rate : 0.6343
P-Value [Acc > NIR] : 0.0013778

Kappa : 0.3286

Mcnemar's Test P-Value : 0.0003304

Sensitivity : 0.4626
Specificity : 0.8471
Pos Pred Value : 0.6355
Neg Pred Value : 0.7322
Precision : 0.6355
Recall : 0.4626
F1 : 0.5354
Prevalence : 0.3657
Detection Rate : 0.1692
Detection Prevalence : 0.2662
Balanced Accuracy : 0.6548

'Positive' Class : 1


The first thing that you get is the confusion matrix output where the left row data is prediction values and column data is reference or actual values from the test data.

This command also gives you classification accuracy under the head

1. Accuracy

The accuracy of the model is 70.65%. This accuracy metric doesn’t tell us what kind of errors we are making while prediction.  

The classification error is 29.35%

The breakdown of the confusion matrix looks are given below

 Actual
NoYes 
  Predicted / ModelNoTRUE NEGATIVE (d) FALSE NEGATIVE (c) 
Yes FALSE POSITIVE (b) TRUE POSITIVE (a)

2. TRUE POSITIVE – Correctly predicted positive outcome. Example d in the matrix. Or the number 68 from the confusion matrix. We correctly predicted that 68 customers will default on loan.   

3. TRUE NEGATIVES – Correctly predicted negative outcome. Example a in the matrix. We correctly predicted that 216 customers will not default on loan.

4. FALSE POSITIVE – Incorrectly predicted to be true. Example c in the matrix. We incorrectly predicted that 39 customers will default on loan.

5. FALSE NEGATIVES – incorrectly predicted to be false. Example b in the matrix. We incorrectly predicted that 79 customers will not default on loan.

6. SENSITIVITY – It is also called as Recall rate. You also get Recall rate as an output in R. Sensitivity = TP / (TP + FN) = a / (a+ c) = .4626

7. RECALL = SENSITIVITY  = .4626

8. SPECIFICITY = TN / TN + FP = d / (d + b) = .8471

9. PPV (Pos Pred Value) = (sensitivity * prevalence)/((sensitivity*prevalence) + ((1-specificity)*(1-prevalence))) = 6355

10. NPV (Neg Pred Value) = (specificity * (1-prevalence))/(((1-sensitivity)*prevalence) + ((specificity)*(1-prevalence))) = 0.7322

11. Precision = A/(A+B) = 0.6355 

12. F1 = (1+beta^2)*precision*recall/((beta^2 * precision)+recall) = 5354

Where beta = 1 for this function.

13. Prevalence = (A+C)/(A+B+C+D) =: 0.3657

14. Detection Rate = A/(A+B+C+D) = 1692

15. Detection Prevalence = (A+B)/(A+B+C+D) = .2662

16. Balanced Accuracy = (sensitivity+specificity)/2 = 6548

17. Cohen’s Kappa = (Observed Accuracy – Expected Accuracy) / (1 – Expected Accuracy) = 0.3286

18. Mcnemar’s Test P-Value = 0003304

Source: RStudio help

Leave a Reply

Your email address will not be published. Required fields are marked *