The aim of cluster analysis is to partition objects into clusters (subsets, classes). This partition should have the following properties
- Homogeneity within the clusters -- data that belong to the same cluster should be as similar as possible.
- Heterogeneity between the clusters -- data that belong to different clusters should be as dissimilar as possible
The similarity or dissimilarity between objects can be measured with e.g. Euclidian distance, Mahalonobis distance. The range (or unit) of values should be suitable scaled to obtain reasonable distance values. In any case the results will depend on the selected value range.
The number of clusters k has to be known in advance in order to aplly a cluster analysis algorithm. This assumption is in many application not available. In this case the cluster analysis is applied multiple times for a different number of clusters. The results are compared with each other on basis of quality and validity functions, in order to find the best partitioning.
There are numerous clustering methods. Two methods will be
Crisp cluster algorithms
Every object is assigned exactly one clusters. The k-medoid cluster algorithm can be described in four steps:
Fig. 1: Example of a crisp clustering result
Fuzzy Cluster algorithms
The objects are assigned with a gradual membership to the clusters. The minimization of the objective function is yield with the following procedure
Fig. 2: Gradual assignment of objects to clusters
- Kaufmann, L, and Rousseeuw, PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons Ltd., Chinchester, New York, Weinheim.
- Höppner, F, Klawonn, F, Kruse, R, and Runkler, T (1999) Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, John Wiley & Sons Ltd., Chinchester New York Weinheim.