How to Clean Your Data for Your Real-Life Data Science Projects | by Mythili Krishnan | Dec, 2024


We often hear — “Ohh, there are packages available to do everything! It takes only 10 mins to run the models using the packages.” Yes, agreed there are packages — but they work only if you have a clean dataset ready to go with it. And how long does it take to create, curate, and clean a dataset from multiple sources that’s fit for purpose? Ask a data scientist who is struggling to create one. All those who had to spend hours cleaning the data, researching, reading and re-writing codes, failing and re-writing again will agree with me! This brings us to the point:

‘Real-life data science is 70% data cleaning and 30% actual modeling or analysis’

Hence, I thought, let’s go back to basics for a bit and learn about how to clean datasets and make them usable for solving business problems more efficiently. We will start this series with missing values treatment. Here is the agenda:

  1. What are missing values
  2. What are the causes of missing values in a dataset
  3. Why are missing values important
  4. Approach to deal with missing values

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here