We are taking below famous data which is widely used data set for explaining Decision tree algorithm. Once we build a decision tree it looks like below. Now we will see how we achieved above decision tree using Entropy and Information gain matrices. First and foremost question is, how do I chose my root nodeContinue reading “Deriving Decision Tree using Entropy (ID3 approach)”
Category Archives: Decission Trees
Sampling With/Without Replacement
Sampling is the part of inferential statistics which is use to estimate the population based on the sample data and it is one of the important technique in statistics and Machine learning.In this post we will learn about sampling with replacement and without replacement. Sampling with replacement Let’s take an example we have below listContinue reading “Sampling With/Without Replacement”
Measure of impurity
In a given dataset that contains class for the predicted/dependent variable (like Yes,No,Neutral etc..), we can measure homogeneity or heterogeneity of the table based on the classes. We say a dataset is pure or homogeneous if it contains only a single class(either YES or NO). If a dataset contains several classes, then we say that theContinue reading “Measure of impurity”
How to calculate Gain Ratio
As we discussed in one of our article about How and when does the Decision tree stop splitting? Gain Ratio is modification of information gain that reduces its bias. Gain ratio overcomes the problem with information gain by taking into account the number of branches that would result before making the split.It corrects information gainContinue reading “How to calculate Gain Ratio”
How and when does the Decision tree stop splitting?
By default Splitting will stop when the tree reaches 100% purity, means when the child/subset node has homogeneous/single class or we can also say when child/subset node is pure(means all classes will be either Yes or No), this will lead to overfitting problem. In simple when my algorithm learned everything from my training data, It willContinue reading “How and when does the Decision tree stop splitting?”
Pruning
Practically, post-pruning overfit trees is more successful because it is not easy to precisely estimate when to stop growing the tree.In this article we will see how to overcome overfitting problem using Post pruning. We need to follow below steps in order to prune a full grown tree Here we willContinue reading “Pruning”
Bootstrap Sample
Before we discuss about Bootstrap Sample, read about Sampling With Replacement and Sampling Without Replacement A bootstrap sample is a random sample that is performed with replacement. Bootstrapping is a resampling with replacement which uses sampling with replacement, It will generate N number of samples and each sample is the same size of population. Let’sContinue reading “Bootstrap Sample”