Next: Clustering the SOM Up: Knowledge Discovery With the Previous: Similarity between maps

Clustering

Data clustering algorithms, like k-means or ISODATA-algorithms [38], typically strive to minimize the distances within and maxime the distances between clusters. The distance measure may be based on single linkage, where the distance of a cluster X from another cluster Y is the minimum of the distances between their component clusters and , or complete linkage, where the distance is the maximum thereof:

eqnarray1035

A defect with single linkage is that the clusters easily become like long chains even if this is uncharacteristic to the data. On the other hand the complete linkage distance measure may be overly restrictive. An ideal cluster measure would perhaps be somewhere between complete and single linkage. The measure would take into account all the points in the cluster, but they would be weighted approriately. This way the measure would both take into account the sample distance and keep the shape of the cluster relatively smooth. The SOM algorithm can be shown to use such measure implicitly [24].

Juha Vesanto
Tue May 27 12:40:37 EET DST 1997