Authors: Fleischmann, M.
Journal: Journal of Open Source Software, 8(89), 5240
DOI: 10.21105/joss.05240
Open Access
abstract
Given a heterogeneous group of observations, researchers often try to find more homogenous groups within them. Typical is the use of clustering algorithms determining these groups based on statistical similarity. While there is an extensive range of algorithms to be chosen from, they often share one specific limitation - the algorithm itself will not determine the optimal number of clusters a group of observations shall be divided into. The solution is usually depending on internal cluster validity measures, but those provide only limited insight and can result in a suboptimal choice. This paper presents a Python package named clustergram offering tools to analyze the clustering solutions and visualize the behavior of observations in relation to a tested range of options for the number of classes, enabling a deeper understanding of the behavior of observations splitting into classes and better-informed decisions on the optimal number of classes.