Properties of the Toolbox

In this document, the properties of the beta version of the SOM Toolbox are listed.


Vectors and matrices
Names
Map size, lattice and shape
Initialization and training
SOM_PAK compatibility
Visualization
GUI tools


Vectors and matrices

There are two important properties that vectors in the context of SOM may have:
  1. missing components
  2. associated labels
In the Toolbox, the missing components are marked by 'NaN's in the vectors. The labels are saved to a separate cell-matrix (the 'labels' field in the structures). Neither the vector dimension nor the number of labels per vector is restricted (of course your hardware will make certain limits). Labels can be manipulated using the function som_label.

NOTE: The codebook of a SOM must not have missing components in its vectors.

NOTE: The labels in map and data structures are organized differently. In the data struct, the labels are in a cell matrix of strings. In the map struct, the labels form a cell matrix of cells, where each cell is a cell array of strings.

map matrix The codebook matrix in the self-organizing map structure is a (N+1)-dimensional matrix, where N is the dimension of the map grid. In 2-dimensional grid case (3-dimensional codebook matrix), the different properties of the matrix can be addressed as follows. For the N-dimensional case analogous rules apply.
the whole codebook
sM.codebook
one weight vector (coords x, y)
sM.codebook(x,y,:)
one component plane (ith)
sM.codebook(:,:,i)
a single value of the codebook matrix
sM.codebook(x,y,i) or sM.codebook(n), where n = ((i-1)*X + (x-1))*Y + y with X and Y the size of the matrix in x and y directions.
map grid size
sM.msize
all labels
sM.labels
labels of one weight vector
sM.labels{x,y}
ith label of one weight vector
sM.labels{x,y}{i}
Related functions: som_create.m, som_label.m
data matrix The data matrix is a 2-dimensional matrix with each row corresponding to one data vector.
the whole data set
sD.data
one data vector (ith)
sD.data(i,:)
all labels
sD.labels
labels of one data vector
sD.labels{i,:}
jth label of ith data vector
sD.labels{i,j}
Related functions: som_data_struct.m, som_label.m


Names

To accommondate visualization and understanding of maps the vector components, the map and data structures can be named.

Name of map/data By default the a map will be named: 'SOM date' and a data struct: 'unknown'.
Related functions: som_name.m
Names of components The components are given names 'Var#' by default, where # is the ordinal number of the component.
Related functions: som_name.m


Map size, lattice and shape

A new feature of Matlab 5 is that it supports N-dimensional matrices. This offers a natural representation of the 3-dimensional weight vector matrix of the typical 2-dim SOM grid. It also gives natural representation for N-dim SOM grids.

NOTE: When using higher than 2-dim SOM grid, visualization is no longer possible. :-(.

DEFAULT VALUES:

msize
2-dimensional grid with 5*(number of data samples)^0.543 vectors divided in proportion of the two biggest principal components of the training data.
NOTE: for low number of samples (<34) this produces more map units than there are samples. This is OK if the goal is just get a feeling of the data, but not is the goal is probabilistic estimation. OTOH in the latter case one should have a lot more samples than 34 anyway... With 10000 samples, the equation gives ~13 samples per map unit.
lattice
'hexa'
shape
'rect'

map size There are no restrictions to the map size or dimension.
Related functions: som_create.m, som_init.m
map lattice Map lattice is the local topology of the map, i.e., it dictates the net structure of the map and neighbors of map units. There are two lattices:
  • rectangular ('rect')
  • hexagonal ('hexa')
The latter works only for 2-dimensional map grids.
Related functions: som_trainops.m, som_unit_distances.m, som_unit_coords.m, som_unit_neighborhood.m
map shape There are three different map shapes:
  • rectangular ('rect')
  • cylinder ('cyl')
  • toroid ('toroid')
Related functions: som_unit_distances.m, som_unit_coords.m, som_unit_neighborhood.m


Initialization and training

The different training options include neighborhood function, initialization and training algorithms, neighborhood radius, learning function and coefficient. The best-matching unit (BMU) search can be modified by using a mask vector: each component in BMU search is masked according to the component in the vector.

DEFAULT VALUES:

init_type
'linear'
train_type
'batch'
neigh
'gaussian'
epochs
either 5 * (vectors on the map)/(vectors in the data) (first run) or 3 times that (otherwise)
initial radius
either half of maximum sidelength of the grid (first run) or 10% of that (otherwise)
final radius
initial radius of second run (first run), or 1 (otherwise)
initial alpha
either 0.5 (first run) or 0.05 (otherwise)
alpha function type
'linear'

NOTE: The som_train.m function takes into accout whether the map structure has been trained before or not (this information is found from the train_sequence field of the map struct). If the map has already been trained (and the parameters are not explicitly specified), the latter default values are used.

initialization Two initialization algorithms are implemented:
  • linear initialization ('linear')
  • random initialization ('random')
Related functions: som_init.m, som_lininit.m, som_randinit.m
neighborhood function Four neighborhood functions are implemented:
  • bubble (aka box) ('bubble', step function step(x <= R) = 1, if x <= R, 0 otherwise)
  • gaussian ('gauss', exp(- x^2 / 2*R^2))
  • cut gaussian ('cutgauss', exp(- x^2 / 2*R^2) * step(x <= R))
  • epanechikov ('ep', (1 - x^2/R^2) * step(x <= R))
The differences between the functions become apparent from the figure below. [Neighborhood functions]
Related functions: som_seqtrain.m, som_batchtrain.m, som_trainops.m
training parameters The learning coefficient (alpha(t)) decreases accoring to the learing rate function from the given initial value to zero in the end.
The neighborhood radius decreases linearly from given initial value to given final value.
Both learning coefficient and radius can also be given explicitly for each training step.
The length of training is given in epochs (1 epoch = number of samples in the training set).
The training can be tracked. By default the algorithms only estimate time needed for training. Other options are:
  • tracking the estimated time and quantization error
  • plotting the quantization error
  • plotting the quantization error and first two components of the map weight vectors
Related functions: som_train.m, som_seqtrain.m, som_batchtrain.m
component masks The distance calculation e.g. in function som_bmus.m is calculated with sum(((v1 - v2).*m).^2), where v1 and v2 are some vectors and m is the mask vector. By default the mask vector (mask field in the map struct) is a vector of ones. By setting a component in the vector to 0, the corresponding data component is ignored in the distance calculation. Notice that this masking produces normally a different result from simple preprocessing v_new = v_old.*w.
Related functions: som_trainops.m, som_seqtrain.m, som_batchtrain.m, som_bmus.m
training algorithms Two training algorithms are implemented:
  • the traditional sequential training ('seq')
  • a faster batch-training algorithm ('batch')
Related functions: som_seqtrain.m, som_batchtrain.m


SOM_PAK compatibility

As the SOM Toolbox is an extension for the SOM_PAK package, it is necessary to offer compatibility functions between the two packages. These functions read and write data and map files in SOM_PAK format: som_read_cod.m, som_write_cod.m, som_read_data.m, som_write_data.m.

NOTE: One thing is added to the basic SOM_PAK format. The names of the components are added to the beginning of *.data and *.cod files as a comment line. The line is ignored by SOM_PAK, but the functions som_read_cod.m or som_read_data.m are able to use it. The format of the line is:

    #n name1 name2 name3 ...
That is: the line begins with #n which is followed by the names of the components separated by white spaces (spaces and/or tabs). The functions can also read this component name information back from the file.

NOTE: There is one bad thing about the read and write functions: they are slow. This is because they are based on for-loops and contain many if-then statements and unfortunately these are very slow to perform with Matlab. Therefore whenever possible, use Matlab's own save and load functions to save your maps and data, e.g.

    save map1.mat map
where map1.mat is the file and map is the variable.

NOTE: There are a few features of the SOM_PAK that are not supported in the Toolbox:


Visualization

The versatile visualization offered by the Matlab is one of the main reasons why the SOM Toolbox project was initiated. In the beta version most visualization is still in 2D, although Matlab makes also 3D-visualization easy. The primary visualization function som_show.m uses 2D-visualization. The som_showgrid.m can also make primitive 3D-visualizations.

There are three approaches to visualization: matrix-level functions, struct-level functions and the GUI. Two first are presented here, the last is presented in the next section.

Matrix-level functions

The basic visualization tools are the som_planeX.m functions where X stands for one (or none) uppercase letter/number. With these tools it is possible to visualize a component plane or an unified distance matrix and to label it in different manners. There are some common features in these functions:

  1. These functions do not handle map or data structures. Their input is a data or coordinate matrix. The lattice have to be specified, too.
  2. The planes are directed as the corresponding matrix would be printed on MATLAB's command window. Other tools should be consistent with this. Plane visualization sets axis scaling so that nodes are squares or hexagons and the following is true for coordinates.
  3. The node (i,j) has coordinates (i,j) in the rectangular lattice. If the lattice is hexagonal, the coordinates are (i,j) for the odd and (i,j+0.5) for the even rows.
  4. All handles to the graphical objects created are returned by the functions. Some of the return values are structs for clarity reasons. Instead of using a complex set of functions and paramters, the user may operate direct on these handles in order to achieve a customized look for the visualization.
  5. All objects created are tagged, that is, the creating function writes a string to the 'Tag' fields of the objects. It is possible to find the objects later using these tags even if the object handles are lost.
  6. Automatic coloring according to data values (flat facecolor for patch objects) is used. The user may alter the colormap or insert a colorbar at any time using matlab workspace commands

FUNCTIONS:

som_plane (Tag on the axis object: Componentplane)
This draws a component plane, which consists of hexagonal or rectangular nodes. The gaps between nodes may be specified.
som_plane3 (Tag on the axis object: Componentplane)
This is merely a demo. The component plane can be viewed in 3D. The z-axis may be used to bring some additional information to the visualization. Otherwise this is a copy of som_plane.
som_planeU (Tag on the axis object: Umatrix)
The unified distance matrix is a low level visualization tool for the output of the 'som_umat' function.
som_planeL (Tags on the text objects: Lab)
Given labels are printed on coordinates specified by a matrix. Text color and font size may be specified.
som_planeH (Tag on the patch/text objects: Hit)
This function can be used to present e.g. a hit distribution calculated by som_hits, or any matrix in general. The output may be graphical or numerical. (A hit distribution may be visualized using the gap specification feature of the som_plane function, too).
som_planeT (Tag on the patch/text objects: Traj)
This function visualizes a a sequence of coordinates as a line connecting several nodes. It draws labels to the points as well, if required. The line style, color, font color and size may be specified.
The som_plane, som_plane3 and som_planeU are functions that actually draw the visualization of a plane. Rest of the som_planeX are meant to be used on this visualization. Of course, they may be plotted on any axis.

The function som_manualclassify (Tags on the objects: Sel) belongs to the matrix level functions. It can be used to manually classify map nodes. The function draws extra borders on nodes in a specified plane. The color of the borders may be changed by clicking them and a color palette. The function returns a matrix accordnig to the classification. Unfortunately som_manualclassify still lacks it counterpartner in the struct level functions, see below.

EXAMPLES:

  1. handle1=som_plane('hexa',rand(10,15),rand(10,15))
    produces a hexagonal random plane with random gaps on the current axis.
  2. handle2=som_planeL('hexa',[[2 1];[3 1]],'abc');
    labels nodes (2,1) and (3,1) with text 'abc'
  3. set(handle2,'Color','red');
    changes the font color of the labels to red. See the Matlab's User Guide for help on object properties
  4. delete(handle2)
    deletes labels.
  5. handle3=findobj(1,'Tag','Hit')
    searches for the hit marks produced by the function som_planeH in figure object number 1.

Struct-level functions and utilities

The struct-level visualization works in two phases. First, a background image is drawn with som_show and then different kinds of information can be added on top of it with som_add*: hit histograms, labels and trajectories. The add-on functions return a vector of graphics handles, so they can be easily manipulated (set) or removed (delete).

Add-on visualizations may be removed with the som_clear function.

The figure object drawn by som_show has colorbars whose scaling may be redone in several manners by the som_recolorbar tool. With using this function the original data scaling may be restored for visualization purposes.

The map can also be visualized by projecting its weight vectors to a lower dimension. Sammon's mapping is an often used projection technique. Using it the shape of the SOM is visualized more efficiently than with the u-matrix. What is bad about Sammon's mapping is that it is pretty heavy to calculate. Other projection methods include PCA (Principal Component Analysis; linear projection) and CCA (Curvilinear Component Analysis; nonlinear projection). The function som_projection can be used to calculate projections of maps and data sets and the function som_showgrid can be used to visualize map projections.

FUNCTIONS:

som_show
The big brother of som_plane. A map structure is visualized: multiple planes and colorbars are presented in the same figure. (Note: the UserData object property field of the figure is reserved for SOM Toolboxes use.) som_show can visualize component planes, u-matrixes and empty planes, which are convenient for labeling and hit marking.
som_addlabels, som_addhits, som_addtraj
Auxiliary tools for labeling etc. the planes made by som_show. Note: These functions can be used only on the figures created by som_show.
som_recolorbar
Can be used to refresh the colorbars in a visualization after colormap changere and to rescale the tick scaling in the. Note: This function can be used only on the figures created by som_show.
som_showtitle
Plots a movable info text onto a figure produced by som_show. Requires the subfunction som_showtitleButtonDownFcn.
som_clear
This tool is to clear labels from specified subplots even without knowing the object handles.
som_umat
Calculates the U-matrix of a map.
som_projection, som_sammon, som_cca, som_pca
These functions are used to project maps and data sets to a lower dimension for visualization with som_showgrid
som_showgrid
Visualizes the grid of the given map, or the given data matrix. Especially used to visualize map projections.
som_profile
Visualizes the model vectors of a given map. It uses a conventional visualization method, eg. pie diagram or bar plot, to present one model vector. The plots of model vectors are organized according to the map grid. The function is quite slow for big maps.

EXAMPLES:

  1. handle1=som_show(sMap, [1 2 3],'denormalized');
    Draw component planes 1,2 and 3 from sMap. Use original data scaling in colorbars.
  2. sMap=som_autolabel(sMap,sData); som_addlabels(sMap, 'all', 3);
    Label the 3rd subplot in the figure with the labels in the sMap.
  3. handle2=findobj('Tag','Lab'); set(handle2, 'PointSize', 20);
    Enlarge the font size afterwards. som_addtraj tags the objects with the string 'Traj'.
  4. sam=som_sammon(sMap, 3, 50); som_showgrid(sMap, sam);
    First calculate a 3D Sammon's projection and then visualize it.
  5. som_showgrid(sMap);
    Show the map grid of the map.
  6. som_profile(sMap,'BAR_AXIS_OFF');
    Show the model vectors using bar plot with no axes.


Graphical User Interface

Graphical user interface is an additional component of the SOM Toolbox. It is provided to make both construction and visualization of maps easier. The downfall is that the GUIs are not as flexible as command line functions, and so experienced users will no doubt rather use the Toolbox from the command line.

Initialization and training One tool is used to offer the multitude of initialization and training options to the user.
Related functions: somui_it.m
Visualization Also visualization is handled with a single tool.
Related functions: somui_vis.m


somtlbx@mail.cis.hut.fi
Last modified: Fri Dec 19 14:02:51 EET 1997