Building Reliable Models: Understanding and Verifying the Assumptions Behind Multiple Linear Regression
Generally speaking, many data science practitioners use statistical and machine learning models, without giving much thought to the assumptions behind the constructs. Each model, whether it’s a regression model assuming linearity or a decision tree that requires independence among observations, is built upon specific assumptions about the data. Overlooking these assumptions can result in wrong predictions, false results, and in the end, misinformed choices.
In this post, we will go through the six assumptions behind the multiple linear regression model. You can get the code in this post in here.
1. Linearity
Assumption: The relationship between the dependent variable and each of the independent variables is linear.
Implication: The model should correctly capture the effect of changes in the independent variables on the dependent variable. The relationship between the features and the target should be additive and linear in nature.