next up previous contents
Next: Clustering the SOM Up: Knowledge Discovery With the Previous: Similarity between maps

Clustering

 

Data clustering algorithms, like k-means or ISODATA-algorithms [38], typically strive to minimize the distances within and maxime the distances between clusters. The distance measure may be based on single linkage, where the distance of a cluster X from another cluster Y is the minimum of the distances between their component clusters tex2html_wrap_inline3378 and tex2html_wrap_inline3380 , or complete linkage, where the distance is the maximum thereof:

eqnarray1035

A defect with single linkage is that the clusters easily become like long chains even if this is uncharacteristic to the data. On the other hand the complete linkage distance measure may be overly restrictive. An ideal cluster measure would perhaps be somewhere between complete and single linkage. The measure would take into account all the points in the cluster, but they would be weighted approriately. This way the measure would both take into account the sample distance and keep the shape of the cluster relatively smooth. The SOM algorithm can be shown to use such measure implicitly [24].





Juha Vesanto
Tue May 27 12:40:37 EET DST 1997