Next: Topology Up: Knowledge Discovery With the Previous: Object visualization:

Map measures

The quality of the SOM is usually evaluated based on its resolution and on how well the map preserves the topology of the data set. Alternatively the latter may be replaced by a measure of the smoothness of the map. Other measures of the map quality have been proposed e.g. by Kraaijveld [26]. His quality measure is based on classification accuracy of the map. Unfortunately it requires labeled input samples, so it is not generally applicable.

An important aspect regarding the possible success of the SOM is the true (or ``natural'') dimension of the data. If it is bigger than the dimension of the map grid, the map may not be able to follow the distribution of the data set any more [41]. In this case topology preservation and resolution of the mapping can become contradictory goals. A mapping with great resolution may fold itself so that the topology is broken, as shown in figure 2.4.

There are two types of folds on the map. In the first type two vectors wide apart in the input space are mapped close to each other on the map grid. This kind of fold is easy to notice from the u-matrix representation of the map. The second type is the result of two weight vectors close to each other in the input space being mapped wide apart on the grid. This is signalled by the situation where the two closest BMUs of an input vector are not adjacent map units. This kind of folds are often taken as an indication of the topographic error in the mapping. Li discussed different kinds of topological errors in SOMs and proposed that they could be made use of to tell something of the true topology of the data set [51].

The SOM algorithm itself does not give an estimation about the dimension of a data set. Definition and estimation of a data set dimension is discussed in further detail e.g. in [43].

Juha Vesanto
Tue May 27 12:40:37 EET DST 1997