Practically, post-pruning overfit trees is more successful because it is not easy to precisely estimate when to stop growing the tree.In this article we will see how to overcome overfitting problem using Post pruning. We need to follow below steps in order to prune a full grown tree Here we willContinue reading “Pruning”
Category Archives: Machine Learning
Logistic regression
Logistic regression will comes under supervised learning to solve classification problems.some real time use cases like if customer is good or bad,predicting defaulters, Email spam detection, Fraud detection etc.. Great I have a question why can’t we use linear regression to solve classification problem? As we seen above the usage of logistic regression we needContinue reading “Logistic regression”
Mean Median Mode
Mean Might be we heard about “average” in our day to day life, In statistical world average is Mean, where you add up all the numbers and then divide by the count of numbers/data points. Median is the “middle” value in the list of numbers. To find the median, your numbers/data points have to beContinue reading “Mean Median Mode”
Variance and Standard Deviation
Variance Variance is a measurement of the spread between numbers in a given data set. Below is the formula to calculate Variance ,just remember we need to use n-1(where n is count of observations/population/sample) in when we are using sample (click here to know about Sample vs Population). Standard deviation: Standard deviation measures the spread ofContinue reading “Variance and Standard Deviation”
Bootstrap Sample
Before we discuss about Bootstrap Sample, read about Sampling With Replacement and Sampling Without Replacement A bootstrap sample is a random sample that is performed with replacement. Bootstrapping is a resampling with replacement which uses sampling with replacement, It will generate N number of samples and each sample is the same size of population. Let’sContinue reading “Bootstrap Sample”
What is Hypothesis
A tentative insight into the natural world; a concept that is not yet verified but that if true would explain certain facts. A message expressing an opinion based on incomplete evidence What is Hypothesis testing: Proving either our assumptions or Hypothesis is either true or false.In a scientific/statistics, a proposed and testable explanation between twoContinue reading “What is Hypothesis”
Calculating Two sample Paired/Dependent T test:
This is used to test the statistical difference between two means from two different populations. For example, a two-sample hypothesis could be used to test if there is a difference in the mean salary between male and female software engineers in the Bangalore area It is divided into two types. TwoContinue reading “Calculating Two sample Paired/Dependent T test:”
Difference between Z-test, F-test, and T-test
Z-Test: A z-test is used for testing the mean of a population versus a standard, or comparing the means of two populations, with large (n ≥ 30) samples whether you know the population standard deviation or not. It is also used for testing the proportion of some characteristic versus a standard proportion, or comparing theContinue reading “Difference between Z-test, F-test, and T-test”
Calculating Accuracy
In our previous article we seen how to calculate linear regression by hand, in this article we will discuss how to find the accuracy of our linear regression model which we build. Below are the metrics we need to calculate to check our model accuracy. R^2 (Coefficient of Determination) Adjusted R^2 MSE RMSE First,Continue reading “Calculating Accuracy”
How to calculate One sample T test
The One Sample T test determines whether the sample mean is statistically different from a known or hypothesized population mean. Or The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. Sometimes we might know the population mean or sometimes populationContinue reading “How to calculate One sample T test”