Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage | by Michael Zakhary


Using a model’s robustness to outliers to detect them

Towards Data Science
Photo by Will Myers on Unsplash

The problem of outlier detection can be tricky, especially if the ground truth or the description of what is an outlier is ambiguous or based upon multiple factors. Mathematically speaking, an outlier can be defined as data points more than three standard deviations away from a mean. However, in most real-life problems, not all data points away from a mean are of the same significance, sometimes we require a bit more nuance when flagging outliers.

Image by Rohanukhade

Let’s take a quick example:

We have a dataset of water consumption per household. By analyzing the water consumption as a whole and isolating points 3 standard deviations from the mean, we can quickly get the outliers that use the most water.

This however fails to take into account the reason behind the increase in consumption, i.e. there could be multiple reasons why the water consumption is high, some reasons are of more interest…

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here