Uncovering and correcting misconceptions in online data science content to help you learn more effectively
The data science field is vast and complex, often lacking clear-cut answers. While seeking to resolve doubts and learn new concepts online, I’ve come across numerous low-quality, error-prone answers — some surprisingly well-received despite fundamental misunderstandings. To help others navigate these pitfalls, I’m starting a series to share mistakes found in online content (some of those may be mistakes which I made in the past).
In this article, I will share 4 such examples, together with a counter-example for each of them to disprove those statements. For Part 1, these examples will centre around basic machine learning and statistics concepts.
The examples will be structured in this way
Mistake X : <Wrong Statement><Why is it wrong>
This sentence is incomplete, it should be
“In Linear Regression (LR), one of the assumptions is the target Y conditional on X must be normally distributed”
To Lets recall the definition of LR — albeit in its simplest form: the target Y is…