• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Статья

Clustering of modal-valued symbolic data

Batagelj V., Kejzar N., Korenjak–Cerne S.

Symbolic data analysis is based on special descriptions of data known as symbolic objects (SOs). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special type of SO is a representation with frequency or probability distributions (modal values). This representation enables us to simultaneously consider variables of all measurement types during the clustering process. In this paper, we present the theoretical basis for compatible leaders and agglomerative clustering methods with alternative dissimilarities for modal-valued SOs. The leaders method efficiently solves clustering problems with large numbers of units, while the agglomerative method can be applied either alone to a small data set, or to leaders, obtained from the compatible leaders clustering method. We focus on (a) the inclusion of weights that enables clustering representatives to retain the same structure as if clustering only first order units and (b) the selection of relative dissimilarities that produce more interpretable, i.e., meaningful optimal clustering representatives. The usefulness of the proposed methods with adaptations was assessed and substantiated by carefully constructed simulation settings and demonstrated on three different real-world data sets gaining in interpretability from the use of weights (population pyramids and ESS data) or relative dissimilarity (US patents data).