And how to fix it
You had a data interpretation problem, so you tried clustering. Now you have a cluster interpretation problem! There was a suspicion that patterns might exist in the data. Reasonably, the hope was that adding some structure through unsupervised learning would lend some insights. Clusters are the go-to tool for finding structure. Thus, you embarked on your journey. You spend considerable money on computing. You invest a lot of sweat in fiddling with cluster tuning parameters. Just to be sure, you try a few algorithms. But at the end of the day you’re left with rainbow plots of clustered data that might have some meaning — just maybe — if you squint hard enough. You go home with an uneasy suspicion that it was all for naught. Sadly, this is too often the case. Why should this be though?
Failing to produce value in a clustering project often comes from a few causes: poor understanding of the data, too little attention on the desired outcome, and poor tool choice. We’ll walk through each of these in turn. To motivate the discussion, it is illuminating to understand the reasons clustering techniques exist. To get there, we’ll review what clustering is and a few of the problems that prompted the development of clustering techniques.