Stratified sampling

Imbalanced data is one of the major issue in classification problem. Why we will have imbalanced data? Let’s say if i have 100 customer who is holding credit card, may be maximum I may have 2 or 3% defaulters and remaining 95 to 97% are perfect payers (This is called presence of minority class ),Continue reading “Stratified sampling”

Feature selection

In real time we will have lot of variables/features and some of the variables might carry same information(like age and date of birth),some of the variables like firstName, LastName etc.. which wont have any values during model building, so we need to remove the variables and this process we called it Feature selection. Let’s takeContinue reading “Feature selection”

Bootstrap Sample

Before we discuss about Bootstrap Sample, read about Sampling With Replacement and Sampling Without Replacement A bootstrap sample is a random sample that is performed with replacement. Bootstrapping is a  resampling with replacement  which uses sampling with replacement, It will generate N number of samples and each sample is the same size of population. Let’sContinue reading “Bootstrap Sample”

Random Forest

Before we discuss about Bagging and Random forest we have to understand about Bootstrap sample. Bagging: Is also called bootstrap aggregator it gives best accuracy than decision tree and to reduce the variance. Bagging is very easy when you know how Decision tree and bootstrap sample works.It will use the greedy search algorithms like Entropy, Gini,Continue reading “Random Forest”