Outliers treatment

An extreme value or low value compared to other observations is called Outliers or the observation that is very far from other observations.

Parametric Machine Learning Algorithms are very sensitive with Outliers. But why? The first person who get affected with outliers is mean, all parametric machine learning will use mean,so It will impact our predictions.

Let’s say we are calculating the average salary in Bangalore, so we collected some sample from different companies. We collected two different sample one of the sample includes the salary of CEO and the other don’t.See the below image you will see the difference in mean between two samples.

Hope you are clear the effect of outliers.

Reasons for Outliers presence:

There are multiple reasons, listed below.

Humans error during data entry
Malfunction of the measuring equipment
Data extraction or data transformation errors
Natural/real outliers
Sampling errors

Whatever the reasons of outliers, we need to deal with it very carefully.

Types of Outliers:

Univarient
Multi /Bivariant

Univarient: we already seen with example of salary.

In case of Multi/Bivariate, we need to plot the multi dimensional graph to find the outliers.

Let’s take the below data set, which has experience and Salary, if you see in individual variables we don’t see any outliers.But when you see them with some relation like salary based on experience, check the 4th observation the person with 2 years of experience has 80k salary, which is a outlier when compared to others years of experience.

If you see the below scatter plot the highlighted one is an outlier

How to detect outliers:

For small observations it is easy to identify the outliers, but if we have very huge amount of data, how do we do it?

visualization :

We can use below mentioned visualization techniques to identify visually.

Box plot
Histogram
Scatter plot

Statistics:

Outliers treatment

Reasons for Outliers presence:

Types of Outliers:

How to detect outliers:

Published by viswateja3

Leave a comment Cancel reply

Reasons for Outliers presence:

Types of Outliers:

How to detect outliers:

Share this:

Related

Published by viswateja3

Leave a comment Cancel reply