Imagine you have 100 images of red and green apples, and your goal is to build a model that can tell the difference between them. This is a supervised learning task (but we’ll save the deeper explanation of that for another post).
We typically divide the dataset into 80% for training and 20% for testing. This means:
- 80 images will be used for training and validation.
- 20 images will be used for testing.
Think of training like teaching a kid to identify apples.
- You show him the first image: a red apple, and tell him, “This is red.”
- Next, you show him a green apple and say, “This is green.”
By going through 80 images like this, your “kid” (model) learns to distinguish between red and green apples. The more images you give him, the better he becomes at recognizing patterns.