Sampling With/Without Replacement

Sampling is the part of inferential statistics which is use to estimate the population based on the sample data and it is one of the important technique in statistics and Machine learning.In this post we will learn about sampling with replacement and without replacement. Sampling with replacement Let’s take an example we have below listContinue reading “Sampling With/Without Replacement”

Analysis of variance (ANOVA)

Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. Example 1): Let’s say they are couple of colleges in your area and you want to know which college give the best performance(In this case all students took same exam from different colleges) Example 2): Lets say I have threeContinue reading “Analysis of variance (ANOVA)”

One way ANOVA calculation

Lets calculate one way ANOVA with the below dataset. Assumptions:                 Null Hypothesis =         H0: µ1=µ2=µ3                 Alternative Hypothesis= Ha: µ1!=µ2!=µ3 Calculate the Mean: Grand Mean: Mean of all sample means or mean for all observation from all samples Between Group Variability: When you see below image the two different samples isContinue reading “One way ANOVA calculation”

Data Tranformations

In the real time most of the variables are not normally distributed and most of the parametric statistics test(ANOVA,T test, Regression etc..) are based on the assumption that the data is normally distributed therefore it do not meet the assumptions of statistical tests if the data is not normally distributed,in this case the results willContinue reading “Data Tranformations”

Simple Moving Average

Moving average is also called Simple Moving Average(SMA) is widely used technique to find the direction of the trend form the past data.It is widely used for forecasting long term trends. We will calculate moving average for three years with the below data set. Three years moving average for the above data set means weContinue reading “Simple Moving Average”

Kendall Rank Correlation

Rank correlation is when two variables are ranked the change in one shows the same/positive/negative change in another rank when we measure it across two points. Don’t worry if you still don’t understand, we will find Kendall rank correlation using below dataset. We are trying to see if there is any correlation if size ofContinue reading “Kendall Rank Correlation”

Chi Square

We know correlation is used to check the relation between two continuous variables,We should also have some kind of mechanism to check the relation between two categorical variables,and that is Chi-Square. Steps to check the relation between two categorical variables: Define hypothesis Define alpha Find out the Degree of freedom Define the rule Calculate theContinue reading “Chi Square”

Spearman Rank Correlation

Rank correlation is when two variables are ranked the change in one shows the same/positive/negative change in another rank when we measure it across two points. Don’t worry if you still don’t understand, we will find Kendall rank correlation using below dataset. We are trying to see if there is any correlation if size ofContinue reading “Spearman Rank Correlation”

Factorials and Combination

We might have heard about Factorials right from our childhood and also one of the best example learning any programming language. In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positiveintegers less than or equal to n. For example, 5 ! = 5 * 4 * 3 * 2 * 1 = 120 4!Continue reading “Factorials and Combination”

Synthetic Minority Over-sampling Technique (SMOTE)

Imbalanced data is one of the main issue in classification problem. Why we will have imbalanced data? Let’s say if I have 100 customer who is holding credit card, may be maximum I may have 2 or 3% defaulters and remaining 95 to 97% are perfect payers (This is called presence of minority class ),Continue reading “Synthetic Minority Over-sampling Technique (SMOTE)”