Gower’s Distance for Mixed Categorical and Numerical Data | by Haden Pelletier | Jul, 2024

A distance measure for clustering mixed data

Most likely you have heard of Manhattan distance or Euclidean distance. These are two different metrics which provide information as to how distant (or different) two given data points are.

Manhattan and Euclidean distance graphed. Image by author

In a nutshell, Euclidean distance is the shortest distance from point A to point B. Manhattan distance calculates the sum of the absolute differences between the x and y coordinates and finds the distance between them as if they were placed on a grid where you could only go up, down, left, or right (not diagonal).

Distance metrics often underlie clustering algorithms, such as k-means clustering, which uses Euclidean distance. This makes sense, as in order to define clusters, you have to first know how similar or different 2 data points are (aka how distant they are from each other).

Calculating the distance between 2 points

To show this process in action, I will start with an example using Euclidean distance.

Gower’s Distance for Mixed Categorical and Numerical Data | by Haden Pelletier | Jul, 2024

A distance measure for clustering mixed data

Calculating the distance between 2 points

Recent Articles

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

An AI Customer Service Chatbot Made Up a Company Policy—and Created a Mess

Bedrohungs-Monitoring: Die 10 besten Tools zur Darknet-Überwachung

7 “Useless” Python Standard Library Functions You Should Know

LLMs Can Now Solve Challenging Math Problems with Minimal Data: Researchers from UC Berkeley and Ai2 Unveil a Fine-Tuning Recipe That Unlocks Mathematical Reasoning...

Related Stories

Leave A Reply Cancel reply