Decision tree can be used for both classification and regression problems.
Terminologies Related To Decision Tree
Branch/ Sub Tree
•Algorithms used in decision tree –
Ginni Index – •Higher the value of Gini higher the homogeneity. •CART (Classification and Regression Tree) uses Gini method to create binary splits
Chi-Square – •Higher the value of Chi-Square higher the statistical significance of differences between sub-node and Parent node.
Reduction In Variance – Reduction in variance is an algorithm used for continuous target variables (regression problems).
If we can use logistic regression for classification problems and linear regression for regression problems, why is there a need to use trees?
Algorithm to be used depends on the type of problem we are solving.
•If the relationship between dependent & independent variable is well approximated by a linear model, linear regression will outperform tree based model.
•If there is a high non-linearity & complex relationship between dependent & independent variables, a tree model will outperform a classical regression method.
•If you need to build a model which is easy to explain to people, a decision tree model will always do better than a linear model.
CART decision tree algorithm
fit <- rpart(Survived ~ Pclass + Sex + Age + SibSp + Embarked, data=train, method="class")
# fit is our decision tree model
# let us visualize the tree
# Let us make the prediction using decision tree model
Prediction <- predict(fit, test, type = "class")
submit <- data.frame(PassengerId = test$PassengerId, Survived = Prediction)
write.csv(submit, file = "myfirstdtree.csv", row.names = FALSE)