## Decision trees with R codes

Decision tree can be used for both classification and regression problems.

**Terminologies Related To Decision Tree**

•Root Node

Splitting

Decision Node

Leaf/Terminal Node

Pruning

Branch/ Sub Tree

Parent/Child Node

•Algorithms used in decision tree –

Ginni Index – •Higher the value of Gini higher the homogeneity. •CART (Classification and Regression Tree) uses Gini method to create binary splits

Chi-Square – •Higher the value of Chi-Square higher the statistical significance of differences between sub-node and Parent node.

Information Gain

Reduction In Variance – Reduction in variance is an algorithm used for continuous target variables (regression problems).

**If we can use logistic regression for classification problems and linear regression for regression problems, why is there a need to use trees? **

**Algorithm to be used depends on the type of problem we are solving.**

•If the relationship between dependent & independent variable is well approximated by a linear model, linear regression will outperform tree based model.

•If there is a high non-linearity & complex relationship between dependent & independent variables, a tree model will outperform a classical regression method.

•If you need to build a model which is easy to explain to people, a decision tree model will always do better than a linear model.

`CART decision tree algorithm`

`library(rpart)`

`install.packages('rattle')`

`install.packages('rpart.plot')`

`install.packages('RColorBrewer')`

`library(rattle)`

`library(rpart.plot)`

`fit <- rpart(Survived ~ Pclass + Sex + Age + SibSp + Embarked, data=train, method="class")`

`# fit is our decision tree model `

`# let us visualize the tree`

`plot(fit)`

`text(fit)`

`fancyRpartPlot(fit)`

`# Let us make the prediction using decision tree model`

`Prediction <- predict(fit, test, type = "class")`

`submit <- data.frame(PassengerId = test$PassengerId, Survived = Prediction)`

`write.csv(submit, file = "myfirstdtree.csv", row.names = FALSE)`