Why Clustering Fails. And how to fix it | by Ryan Feather | Jul, 2024


And how to fix it

Towards Data Science

You had a data interpretation problem, so you tried clustering. Now you have a cluster interpretation problem! There was a suspicion that patterns might exist in the data. Reasonably, the hope was that adding some structure through unsupervised learning would lend some insights. Clusters are the go-to tool for finding structure. Thus, you embarked on your journey. You spend considerable money on computing. You invest a lot of sweat in fiddling with cluster tuning parameters. Just to be sure, you try a few algorithms. But at the end of the day you’re left with rainbow plots of clustered data that might have some meaning — just maybe — if you squint hard enough. You go home with an uneasy suspicion that it was all for naught. Sadly, this is too often the case. Why should this be though?

Some real clusters. Image released under public domain by NASA and STScI.

Failing to produce value in a clustering project often comes from a few causes: poor understanding of the data, too little attention on the desired outcome, and poor tool choice. We’ll walk through each of these in turn. To motivate the discussion, it is illuminating to understand the reasons clustering techniques exist. To get there, we’ll review what clustering is and a few of the problems that prompted the development of clustering techniques.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here