Introduction to Self-Organizing Map

The SOM algorithm resembles other Vector Quantization (VQ) algorithms, like k-means, and is closely related to principal curves. The important distinction from VQ techniques is that the neurons are organized on a regular grid and along with the selected neuron also its neighbors are updated, whereby the SOM performs an ordering of the neurons. In this respect the SOM is a multidimensional scaling methods which project data from input space to a lower-dimensional output space. In the SOM similar vectors in the input space are projected onto nearby neurons on the map.

Structure of SOM

A SOM is formed of neurons located on a regular, usually 1- or 2-dimensional grid. Each neuron i of the SOM is represented by an n-dimensional weight or reference vector mi = [mi1 mi2 ... mi], where n is equal to the dimension of the input vectors. Higher dimensional grids can but they are not generally used since their visualization is much more problematic. Usually the map topology is a rectangle but also toroidal topologies have been used succesfully.

Neighborhood relation: The neurons of the map are connected to adjacent neurons by a neighborhood relation dictating the structure of the map. Immediate neighbors, the neurons that are adjacent, belong to the 1-neighborhood Ni1 of the neuron i. In the 2-dimensional case the neurons of the map can be arranged either on a rectangular or a hexagonal lattice. Neighborhoods of different sizes in rectangular and hexagonal lattices are illustrated in figure 1 below. The number of neurons determines the granularity of the resulting mapping, which affects the accuracy and the generalization capability of the SOM.

grids
Figure 1: Neighborhoods (size 1, 2 and 3) of the unit marked with black dot: (a) hexagonal lattice, (b) rectangular lattice. The innermost polygon corresponds to 1-neighborhood, the second to the 2-neighborhood and the biggest to the 3-neighborhood.

Initialization

Size of map: The topological relations and the number of neurons are fixed from the beginning. The number of neurons should usually be selected as big as possible, with the neighborhood size controlling the smoothness and generalization of the mapping. The mapping does not considerably suffer even when the number of neurons exceeds the number of input vectors, if only the neighborhood size is selected appropriately. However, as the size of the map increases e.g. to tens of thousands of neurons the training phase becomes computationally impractically heavy for most applications.

Initialization algorithms: Before the training phase initial values are given to the weight vectors. The SOM is robust regarding the initialization, but properly accomplished it allows the algorithm to converge faster to a good solution. Typically one of the three following initialization procedures is used:

Training

Best-Matching Unit (BMU): In each training step, one sample vector x from the input data set is chosen randomly and a similarity measure is calculated between it and all the weight vectors of the map. The Best-Matching Unit (BMU), denoted as c, is the unit whose weight vector has the greatest similarity with the input sample x. The similarity is usually defined by means of a distance measure, typically Euclidian distance. Formally the BMU is defined as the neuron for which ||x - mc|| = mini{||x - mi||}, where ||.|| is the distance measure.

Update rule: After finding the BMU, the weight vectors of the SOM are updated. The weight vectors of the BMU and its topological neighbors are moved closer to the input vector in the input space. This adaptation procedure stretches the BMU and its topological neighbors towards the sample vector. This is illustrated in figure 2 below, where the input vector given to the network is marked by an x. The SOM update rule for the weight vector of the unit i is: mi(t+1) = mi + hci(t)[x(t) - mi(t)], where t denotes time and hci(t) is the neighborhood kernel around the winner unit c at time t.

SOM update
Figure 2: Updating the best matching unit (BMU) and its neighbors towards the input sample marked with x. The solid and dashed lines correspond to situation before and after updating, respectively.

Neighborhood kernel: The neighborhood kernel is a non-increasing function of time and of the distance of unit i from the winner unit c. It defines the region of influence that the input sample has on the SOM. The kernel is formed of two parts: the neighborhood function h(d,t) and the learning rate function alpha(t): hci(t) = h(||rc-ri||, t) alpha(t), where ri is the location of unit i on the map grid.

Neighborhood function: The simplest neighborhood function is the bubble: it is constant over the whole neighborhood of the winner unit and zero elsewhere (see figure 3a below. Another is the Gaussian neighborhood function

exp(...)

(see figure 3b below). It gives slightly better results but is computationally somewhat heavier. Usually the neighborhood radius is bigger at first and is decreased linearly to one during the training.

Bubble neighborhood Gaussian neighborhood
Figure 3: The two basic neighborhood functions: the bubble and the gaussian.

Learning rate: The learning rate alpha(t) is a decreasing function of time. Two commonly used forms are a linear function and a function inversely proportional to time: alpha(t) = A / (t+B), where A and B are some suitably selected constants. Use of the latter function type ensures that all input samples have approximately equal influence on the training result.

Training phases: The training is usually performed in two phases. In the first phase, relatively large initial alpha value and neighborhood radius are used. In the second phase both alpha value and neighborhood radius are small right from the beginning. This procedure corresponds to first tuning the SOM approximately to the same space as the input data and then fine-tuning the map. If the linear initialization procedure is used the first training phase can be skipped. There are several rules-of-thumb, found through experiments, for choosing suitable values.

Variants

There are many variants to the basic SOM. One variantion theme is to use neuron-specific learning rates and neighborhood sizes. Another is to use adaptive or even growing map structures. The goal of these variations is to enable the SOM to follow better the topology of the underlying data set or to achieve better quantization results. A usual defect of these methods is that the visualization properties suffer or that the algorithm becomes computationally heavier.

Batch-algorithm: Another variant of the basic SOM is the batch algorithm. In it, the whole training set is gone through at once and only after this the map is updated with the net effect of all the samples. The algorithm converges usually after a couple of iterations, and is much faster to calculate in Matlab than the normal sequential algorithm.


somtlbx@mail.cis.hut.fi
Last modified: Thu Sep 4 18:04:18 EET DST 1997