This article aims to discuss various evaluation / performance metrics of machine learning based classification models. There are atleast a minimum of 15 different metrics for evaluating the model. These metrics include classification accuracy, confusion matrix, sensitivity, specificity, detection rate, precision, recall, F1 Score, KS Statistic, ROC Curve, Somers-D Statistic, Cohen’s Kappa Value, Mcnemar’s Test P-Value etc.
Before you actually test or evaluate your classification model, one should find out the minimum accuracy the model should have to be considered for evaluation. This minimum accuracy you can find out by using the concept of NULL MODEL. NULL MODEL implies that you make prediction without a model using the distribution in yes or no in the target / outcome variable in the model.
The most important and easy to use metric for evaluating the performance of your classification model is the classification accuracy.
Classification accuracy – This is the percentage of correctly classified observation. Let us your sample size is 1000. The number of correctly predicted classification is 700. Then the classification accuracy becomes 70%.
Classification metrics with R codes
Let us say you are handling “test” data and the Target variable is “default_actual” in the data. The predicted Target variable is named as “default_class”
Classification accuracy for test data sample
> classification <- table(test$default_actual,test$default_class)
With this we get the confusion matrix.
0 216 39
1 79 68
Let us calculate the classification accuracy of the model. The diagonal elements in the classification matrix has been correctly classified (i.e. 0-0 and 1-1 classification in the confusion matrix). The classification accuracy is calculated as (216 + 68)/ (216 + 68 + 39 + 79) = .7064 = 70.64 %
> accuracy_Test <- sum(diag(classification )) / sum(classification)
The other way to evaluate the classification models in R is using caret package. Recall the caret package while running the command using the command library()
The command for getting the confusion matrix using caret package is
Loading required package: lattice
Loading required package: ggplot2
> confusionMatrix(test$default_class, test$default_actual, positive ="1", mode="everything")
Confusion Matrix and Statistics
Reference 0 216 79
1 39 68
Accuracy : 0.7065
95% CI : (0.6593, 0.7506)
No Information Rate : 0.6343
P-Value [Acc > NIR] : 0.0013778
Kappa : 0.3286
Mcnemar's Test P-Value : 0.0003304
Sensitivity : 0.4626
Specificity : 0.8471
Pos Pred Value : 0.6355
Neg Pred Value : 0.7322
Precision : 0.6355
Recall : 0.4626
F1 : 0.5354
Prevalence : 0.3657
Detection Rate : 0.1692
Detection Prevalence : 0.2662
Balanced Accuracy : 0.6548
'Positive' Class : 1
The first thing that you get is the confusion matrix output where the left row data is prediction values and column data is reference or actual values from the test data.
This command also gives you classification accuracy under the head
The accuracy of the model is 70.65%. This accuracy metric doesn’t tell us what kind of errors we are making while prediction.
The classification error is 29.35%
The breakdown of the confusion matrix looks are given below
|Predicted / Model||No||TRUE NEGATIVE (d)||FALSE NEGATIVE (c)|
|Yes||FALSE POSITIVE (b)||TRUE POSITIVE (a)|
2. TRUE POSITIVE – Correctly predicted positive outcome. Example d in the matrix. Or the number 68 from the confusion matrix. We correctly predicted that 68 customers will default on loan.
3. TRUE NEGATIVES – Correctly predicted negative outcome. Example a in the matrix. We correctly predicted that 216 customers will not default on loan.
4. FALSE POSITIVE – Incorrectly predicted to be true. Example c in the matrix. We incorrectly predicted that 39 customers will default on loan.
5. FALSE NEGATIVES – incorrectly predicted to be false. Example b in the matrix. We incorrectly predicted that 79 customers will not default on loan.
6. SENSITIVITY – It is also called as Recall rate. You also get Recall rate as an output in R. Sensitivity = TP / (TP + FN) = a / (a+ c) = .4626
7. RECALL = SENSITIVITY = .4626
8. SPECIFICITY = TN / TN + FP = d / (d + b) = .8471
9. PPV (Pos Pred Value) = (sensitivity * prevalence)/((sensitivity*prevalence) + ((1-specificity)*(1-prevalence))) = 6355
10. NPV (Neg Pred Value) = (specificity * (1-prevalence))/(((1-sensitivity)*prevalence) + ((specificity)*(1-prevalence))) = 0.7322
11. Precision = A/(A+B) = 0.6355
12. F1 = (1+beta^2)*precision*recall/((beta^2 * precision)+recall) = 5354
Where beta = 1 for this function.
13. Prevalence = (A+C)/(A+B+C+D) =: 0.3657
14. Detection Rate = A/(A+B+C+D) = 1692
15. Detection Prevalence = (A+B)/(A+B+C+D) = .2662
16. Balanced Accuracy = (sensitivity+specificity)/2 = 6548
17. Cohen’s Kappa = (Observed Accuracy – Expected Accuracy) / (1 – Expected Accuracy) = 0.3286
18. Mcnemar’s Test P-Value = 0003304
Source: RStudio help