Lets calculate linear regression for the below dataset. We have age of the person which we will denote as X and sugar level of the person which we will denote as Y Step 1) To know how to calculate mean refer link. Step 2) To know how to calculate standard deviation refer link Step 3)Continue reading “How to calculate Simple Liner Regression”
Author Archives: viswateja3
Cost/Loss function
Let’s say we started a mobile manufacturing company last month and I want a new cost study to improve my next month’s budget forecasts.I pay 500$ for rent and 100$ for electricity bill and for manufacturing each mobile we need 50$ and my budget is 4500$ The equation of cost function is C(x)= FC +Continue reading “Cost/Loss function”
How and when does the Decision tree stop splitting?
By default Splitting will stop when the tree reaches 100% purity, means when the child/subset node has homogeneous/single class or we can also say when child/subset node is pure(means all classes will be either Yes or No), this will lead to overfitting problem. In simple when my algorithm learned everything from my training data, It willContinue reading “How and when does the Decision tree stop splitting?”
Pruning
Practically, post-pruning overfit trees is more successful because it is not easy to precisely estimate when to stop growing the tree.In this article we will see how to overcome overfitting problem using Post pruning. We need to follow below steps in order to prune a full grown tree Here we willContinue reading “Pruning”
Logistic regression
Logistic regression will comes under supervised learning to solve classification problems.some real time use cases like if customer is good or bad,predicting defaulters, Email spam detection, Fraud detection etc.. Great I have a question why can’t we use linear regression to solve classification problem? As we seen above the usage of logistic regression we needContinue reading “Logistic regression”
Mean Median Mode
Mean Might be we heard about “average” in our day to day life, In statistical world average is Mean, where you add up all the numbers and then divide by the count of numbers/data points. Median is the “middle” value in the list of numbers. To find the median, your numbers/data points have to beContinue reading “Mean Median Mode”
Variance and Standard Deviation
Variance Variance is a measurement of the spread between numbers in a given data set. Below is the formula to calculate Variance ,just remember we need to use n-1(where n is count of observations/population/sample) in when we are using sample (click here to know about Sample vs Population). Standard deviation: Standard deviation measures the spread ofContinue reading “Variance and Standard Deviation”
Bootstrap Sample
Before we discuss about Bootstrap Sample, read about Sampling With Replacement and Sampling Without Replacement A bootstrap sample is a random sample that is performed with replacement. Bootstrapping is a resampling with replacement which uses sampling with replacement, It will generate N number of samples and each sample is the same size of population. Let’sContinue reading “Bootstrap Sample”
What is Hypothesis
A tentative insight into the natural world; a concept that is not yet verified but that if true would explain certain facts. A message expressing an opinion based on incomplete evidence What is Hypothesis testing: Proving either our assumptions or Hypothesis is either true or false.In a scientific/statistics, a proposed and testable explanation between twoContinue reading “What is Hypothesis”
Calculating Two sample Paired/Dependent T test:
This is used to test the statistical difference between two means from two different populations. For example, a two-sample hypothesis could be used to test if there is a difference in the mean salary between male and female software engineers in the Bangalore area It is divided into two types. TwoContinue reading “Calculating Two sample Paired/Dependent T test:”