Understanding Data Splitting in Machine Learning: Training, Validation, and Testing | by Hozaifa Moustafa | Oct, 2024


Imagine you have 100 images of red and green apples, and your goal is to build a model that can tell the difference between them. This is a supervised learning task (but we’ll save the deeper explanation of that for another post).

We typically divide the dataset into 80% for training and 20% for testing. This means:

  • 80 images will be used for training and validation.
  • 20 images will be used for testing.

Think of training like teaching a kid to identify apples.

  • You show him the first image: a red apple, and tell him, “This is red.”
  • Next, you show him a green apple and say, “This is green.”

By going through 80 images like this, your “kid” (model) learns to distinguish between red and green apples. The more images you give him, the better he becomes at recognizing patterns.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here