Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage | by Michael Zakhary

Using a model’s robustness to outliers to detect them

The problem of outlier detection can be tricky, especially if the ground truth or the description of what is an outlier is ambiguous or based upon multiple factors. Mathematically speaking, an outlier can be defined as data points more than three standard deviations away from a mean. However, in most real-life problems, not all data points away from a mean are of the same significance, sometimes we require a bit more nuance when flagging outliers.

Let’s take a quick example:

We have a dataset of water consumption per household. By analyzing the water consumption as a whole and isolating points 3 standard deviations from the mean, we can quickly get the outliers that use the most water.

This however fails to take into account the reason behind the increase in consumption, i.e. there could be multiple reasons why the water consumption is high, some reasons are of more interest…

Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage | by Michael Zakhary

Using a model’s robustness to outliers to detect them

Let’s take a quick example:

Recent Articles

Using DistilBERT for Resource-Efficient Natural Language Processing

AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

Firing of 130 CISA staff worries cybersecurity industry

Toe Dipping Into View Transitions

The Latest 40-Inch Amazon Fire TV Is the Best $180 Home Upgrade You Can Make Today

Related Stories

Leave A Reply Cancel reply