Outliers and missing values are the most important for any data science engineers need to deal with, we already discussed about outliers. Before talking about how to deal with missing values, let’s talk about types of missing values. Missing at Random (MAR) Missing completely at random (MCAR) Missing not at Random (MNAR) Let’s take one example,Continue reading “Missing Data Analysis with MICE”
Category Archives: K-Nearest Neighbor
Similarity
Suppose we have 10 students in the class and you want to find which students are similar? Now how do we find this? may be based on height, color, marks score by subject or overall score and so on…. Based on the above common points, we can say student A and B is similar inContinue reading “Similarity”
K-Nearest Neighbor
K-Nearest neighbor is one of the non-parametric supervised learning. There is no concept like model building training data it is a instance based learning. We can use KNN for both classification and regression problems. One thing is like more about KNN is we need to pass only one hyper parameters which is K(number of nearestContinue reading “K-Nearest Neighbor”
Synthetic Minority Over-sampling Technique (SMOTE)
Imbalanced data is one of the main issue in classification problem. Why we will have imbalanced data? Let’s say if I have 100 customer who is holding credit card, may be maximum I may have 2 or 3% defaulters and remaining 95 to 97% are perfect payers (This is called presence of minority class ),Continue reading “Synthetic Minority Over-sampling Technique (SMOTE)”