Understanding Data Splitting in Machine Learning: Training, Validation, and Testing | by Hozaifa Moustafa | Oct, 2024

October 16, 2024

Imagine you have 100 images of red and green apples, and your goal is to build a model that can tell the difference between them. This is a supervised learning task (but we’ll save the deeper explanation of that for another post).

We typically divide the dataset into 80% for training and 20% for testing. This means:

80 images will be used for training and validation.
20 images will be used for testing.

Think of training like teaching a kid to identify apples.

You show him the first image: a red apple, and tell him, “This is red.”
Next, you show him a green apple and say, “This is green.”

By going through 80 images like this, your “kid” (model) learns to distinguish between red and green apples. The more images you give him, the better he becomes at recognizing patterns.

Understanding Data Splitting in Machine Learning: Training, Validation, and Testing | by Hozaifa Moustafa | Oct, 2024

Recent Articles

Meta Introduces KernelLLM: An 8B LLM that Translates PyTorch Modules into Efficient Triton GPU Kernels

South Asian Ministries Hit by SideWinder APT Using Old Office Flaws and Custom Malware

Samsung Odyssey G81SF OLED Gaming Monitor Review: Gorgeous

Behind the Curtain: Building Aurel’s Grand Theater from Design to Code

📈 Forecasting Product Sales for the Next 12 Months Using Time Series Models in Python | by Huzaifa Watto | May, 2025

Related Stories

Leave A Reply Cancel reply