Let’s bring this to life with a familiar scenario — unlocking your phone.
When you set up face unlock, your phone captures a feature vector that represents your face. This vector isn’t an image but rather a unique set of numbers that describe features of your face (like spacing between eyes, nose shape, etc.).
The next time you try to unlock your phone, it captures another image, processes it into a feature vector, and compares it to the original. The Siamese Network outputs a similarity score — if it’s above a certain threshold, your phone says, “Welcome back!” If not, it says, “Hold up… you’re not my owner.”
And here’s the beauty of this approach: there’s no need to add your face as a new “category” every time. The network doesn’t care about fixed classes; it cares about comparing pairs. This is why it’s incredibly flexible and can generalize well with limited data.