Liner regression assumptions

  • Linear Assumption. Linear regression assumes that the relationship between your input/independent and output/depended is linear.We can use pearson correlation to check the linearity.if we found there is no linearity between input and output variables We need to transform data to make the relationship linear (e.g. log transform,Box-cox etc..).
  • Data Cleaning. Linear regression assumes that your input and output variables are not noisy.We need to fill or remove the missing values.we also need to consider about outliers(better remove outliers or have separate treatment for them)
  • Multi Collinearity. Linear regression will over-fit your data when you have highly correlated input/independent  variables. Consider calculating pairwise correlations for your input data and removing the most correlated (lets say if we have two output variables like which has the information about the distance in miles and meters, In this case these two variables carry’s the same information this we called Collinearity problems ).
  • Distributions. Linear regression will make more reliable predictions if your input and output variables have a Gaussian distribution. You may get some benefit using transforms (e.g. log or BoxCox) on you variables to make their distribution more Gaussian looking.We also need to consider about accuracy when we do transformations.

Published by viswateja3

Hi

Leave a comment