Properties of the Toolbox
In this document, the properties of the beta version of the SOM
Toolbox are listed.
Vectors and matrices
Names
Map size, lattice and shape
Initialization and training
SOM_PAK compatibility
Visualization
GUI tools
Vectors and matrices
There are two important properties that vectors in the context of SOM may have:
- missing components
- associated labels
In the Toolbox, the missing components are marked by 'NaN's in the
vectors. The labels are saved to a separate cell-matrix (the 'labels' field
in the structures). Neither the vector dimension nor the number of labels per
vector is restricted (of course your hardware will make certain limits).
Labels can be manipulated using the function som_label.
NOTE: The codebook of a SOM must
not have missing components in its vectors.
NOTE: The labels in map and data structures are organized
differently. In the data struct, the labels are in a cell matrix of strings.
In the map struct, the labels form a cell matrix of cells, where each
cell is a cell array of strings.
map matrix
| The codebook matrix in the self-organizing map structure
is a (N+1)-dimensional matrix, where N is the dimension
of the map grid. In 2-dimensional grid case (3-dimensional
codebook matrix), the different properties
of the matrix can be addressed as follows. For the N-dimensional
case analogous rules apply.
- the whole codebook
- sM.codebook
- one weight vector (coords x, y)
- sM.codebook(x,y,:)
- one component plane (ith)
- sM.codebook(:,:,i)
- a single value of the codebook matrix
- sM.codebook(x,y,i) or
sM.codebook(n), where
n = ((i-1)*X + (x-1))*Y + y with
X and Y the size
of the matrix in x and y directions.
- map grid size
- sM.msize
- all labels
- sM.labels
- labels of one weight vector
- sM.labels{x,y}
- ith label of one weight vector
- sM.labels{x,y}{i}
|
Related functions: som_create.m, som_label.m
|
data matrix
| The data matrix is a 2-dimensional matrix with each row
corresponding to one data vector.
- the whole data set
- sD.data
- one data vector (ith)
- sD.data(i,:)
- all labels
- sD.labels
- labels of one data vector
- sD.labels{i,:}
- jth label of ith data vector
- sD.labels{i,j}
|
Related functions: som_data_struct.m, som_label.m
|
Names
To accommondate visualization and understanding of maps the vector
components, the map and data structures can be named.
Name of map/data
| By default the a map will be named:
'SOM date'
and a data struct: 'unknown'.
|
Related functions: som_name.m
|
Names of components
| The components are given names 'Var#'
by default, where #
is the ordinal number of the component.
|
Related functions: som_name.m
|
Map size, lattice and shape
A new feature of Matlab 5 is that it supports N-dimensional
matrices. This offers a natural representation of the 3-dimensional
weight vector matrix of the typical 2-dim SOM grid. It also gives
natural representation for N-dim SOM grids.
NOTE: When using higher than 2-dim SOM grid,
visualization is no longer possible. :-(.
DEFAULT VALUES:
- msize
- 2-dimensional grid with
5*(number of data samples)^0.543
vectors divided in proportion of the two biggest principal
components of the training data.
NOTE: for low number
of samples (<34) this produces more map units than
there are samples. This is OK if the goal is just get a
feeling of the data, but not is the goal is probabilistic
estimation. OTOH in the latter case one should have a lot
more samples than 34 anyway... With 10000 samples,
the equation gives ~13 samples per map unit.
- lattice
- 'hexa'
- shape
- 'rect'
map size
| There are no restrictions to the map size or dimension.
|
Related functions: som_create.m, som_init.m
|
map lattice
| Map lattice is the local topology of the map, i.e., it
dictates the net structure of the map and neighbors
of map units. There are two lattices:
- rectangular ('rect')
- hexagonal ('hexa')
The latter works only for 2-dimensional map grids.
|
Related functions:
som_trainops.m, som_unit_distances.m, som_unit_coords.m, som_unit_neighborhood.m
|
map shape
| There are three different map shapes:
- rectangular ('rect')
- cylinder ('cyl')
- toroid ('toroid')
|
Related functions: som_unit_distances.m,
som_unit_coords.m, som_unit_neighborhood.m
|
Initialization and training
The different training options include neighborhood function,
initialization and training algorithms, neighborhood radius,
learning function and coefficient. The best-matching unit (BMU) search
can be modified by using a mask vector: each component in BMU
search is masked according to the component in the vector.
DEFAULT VALUES:
- init_type
- 'linear'
- train_type
- 'batch'
- neigh
- 'gaussian'
- epochs
- either 5 * (vectors on the map)/(vectors in the data)
(first run) or 3 times that (otherwise)
- initial radius
- either half of maximum sidelength of the grid (first run)
or 10% of that (otherwise)
- final radius
- initial radius of second run (first run), or 1 (otherwise)
- initial alpha
- either 0.5 (first run) or 0.05 (otherwise)
- alpha function type
- 'linear'
NOTE: The som_train.m function takes into accout
whether the map structure has been trained before or not (this
information is found from the train_sequence field of the map
struct). If the map has already been trained (and the parameters
are not explicitly specified), the latter default values are used.
initialization
| Two initialization algorithms are implemented:
- linear initialization ('linear')
- random initialization ('random')
|
Related functions: som_init.m, som_lininit.m, som_randinit.m
|
neighborhood function
| Four neighborhood functions are implemented:
- bubble (aka box) ('bubble', step function step(x <= R) = 1, if x <= R, 0 otherwise)
- gaussian ('gauss', exp(- x^2 / 2*R^2))
- cut gaussian ('cutgauss', exp(- x^2 / 2*R^2) * step(x <= R))
- epanechikov ('ep', (1 - x^2/R^2) * step(x <= R))
The differences between the functions become apparent
from the figure below.
|
Related functions: som_seqtrain.m,
som_batchtrain.m, som_trainops.m
|
training parameters
| The learning coefficient (alpha(t))
decreases accoring to the learing rate function from the given
initial value to zero in the end.
The neighborhood radius
decreases linearly from given initial value to
given final value.
Both learning coefficient and radius
can also be given explicitly for each training step.
The length of training is given in epochs (1 epoch =
number of samples in the training set).
The training can be tracked.
By default the algorithms only estimate time needed for
training. Other options are:
- tracking the estimated time and quantization error
- plotting the quantization error
- plotting the quantization error and first
two components of the map weight vectors
|
Related functions: som_train.m, som_seqtrain.m, som_batchtrain.m
|
component masks
| The distance calculation e.g. in function som_bmus.m
is calculated with sum(((v1 - v2).*m).^2), where
v1 and v2 are
some vectors and m is the mask vector.
By default the mask vector (mask field in
the map struct) is a vector of ones. By setting
a component in the vector to 0, the corresponding data
component is ignored in the
distance calculation. Notice that this masking
produces normally a different result from simple
preprocessing v_new = v_old.*w.
|
Related functions: som_trainops.m, som_seqtrain.m, som_batchtrain.m, som_bmus.m
|
training algorithms
| Two training algorithms are implemented:
- the traditional sequential training ('seq')
- a faster batch-training algorithm ('batch')
|
Related functions: som_seqtrain.m, som_batchtrain.m
|
SOM_PAK compatibility
As the SOM Toolbox is an extension for the SOM_PAK package,
it is necessary to offer compatibility functions between the two
packages. These functions read and write data and map files in
SOM_PAK format: som_read_cod.m, som_write_cod.m,
som_read_data.m, som_write_data.m.
NOTE: One thing is added to the basic SOM_PAK
format. The names of the components are added to the beginning of
*.data and *.cod files as a comment line. The line is ignored by
SOM_PAK, but the functions som_read_cod.m
or som_read_data.m are able to use it. The format of the line is:
#n name1 name2 name3 ...
That is: the line begins with #n which is followed by the
names of the components separated by white spaces (spaces and/or tabs).
The functions can also read this component name information back from
the file.
NOTE: There is one bad thing about the
read and write functions: they are slow. This is because
they are based on for-loops and contain many if-then statements and
unfortunately these are very slow to perform with Matlab. Therefore
whenever possible, use Matlab's own save and load
functions to save your maps and data, e.g.
save map1.mat map
where map1.mat is the file and map is the variable.
NOTE: There are a few features of the SOM_PAK
that are not supported in the Toolbox:
- "fixed-point qualifiers" for forcing some input vectors to specified
locations on the map
- weighting of specific input samples in the training
- snapshots of the codebook during learning
- the advanced features (buffered loading of data,
data redirection and compression, and use of environment variables)
are only supported in the way that Matlab itself supports them
Visualization
The versatile visualization offered by the Matlab is one of the main
reasons why the SOM Toolbox project was initiated. In the beta
version most visualization is still in 2D, although Matlab makes also
3D-visualization easy. The primary visualization function som_show.m
uses 2D-visualization. The som_showgrid.m can also make
primitive 3D-visualizations.
There are three approaches to visualization: matrix-level
functions, struct-level functions and the GUI. Two first are
presented here, the last is presented in the next section.
Matrix-level functions
The basic visualization tools are the som_planeX.m functions
where X stands for one (or none) uppercase letter/number.
With these tools it is possible to visualize a component plane or an
unified distance matrix and to label it in different manners.
There are some common features in these functions:
- These functions do not handle map or data structures. Their input
is a data or coordinate matrix. The lattice have to be specified, too.
- The planes are directed as the corresponding matrix would be printed on
MATLAB's command window. Other tools should be consistent with this.
Plane visualization sets axis scaling so that nodes are squares
or hexagons and the following is true for coordinates.
- The node (i,j) has coordinates (i,j) in the rectangular lattice. If
the lattice is hexagonal, the coordinates are (i,j) for the odd and
(i,j+0.5) for the even rows.
- All handles to the graphical objects created are returned by
the functions. Some of the return values are structs for clarity reasons.
Instead of using a complex set of functions and paramters,
the user may operate direct on these handles in order to achieve
a customized look for the visualization.
- All objects created are tagged, that is, the creating function
writes a string to the 'Tag' fields of the objects. It is possible to
find the objects later using these tags even if the object
handles are lost.
- Automatic coloring according to data values (flat facecolor for
patch objects) is used. The user may alter the colormap or insert a
colorbar at any time using matlab workspace commands
FUNCTIONS:
- som_plane (Tag on the axis object: Componentplane)
- This draws a component plane, which consists of hexagonal or
rectangular nodes. The gaps between nodes may be specified.
- som_plane3 (Tag on the axis object: Componentplane)
- This is merely a demo. The component plane can be viewed in 3D. The
z-axis may be used to bring some additional information to the
visualization. Otherwise this is a copy of som_plane.
- som_planeU (Tag on the axis object: Umatrix)
- The unified distance matrix is a low level visualization
tool for the output of the 'som_umat' function.
- som_planeL (Tags on the text objects: Lab)
- Given labels are printed on coordinates specified by a matrix. Text color and font size may be specified.
- som_planeH (Tag on the patch/text objects: Hit)
- This function can be used to present e.g. a hit distribution
calculated by som_hits, or any matrix in general.
The output may be graphical or numerical.
(A hit distribution may be visualized using the
gap specification feature of the som_plane function, too).
- som_planeT (Tag on the patch/text objects: Traj)
- This function visualizes a a sequence of coordinates as a line
connecting several nodes. It draws labels to the points as well,
if required. The line style, color, font color and size may be
specified.
The som_plane, som_plane3 and som_planeU
are functions that actually draw the visualization of a plane. Rest of
the som_planeX are meant to be used on this
visualization. Of course, they may be plotted on any axis.
The function som_manualclassify (Tags on the objects: Sel)
belongs to the matrix level functions. It can be used to manually
classify map nodes. The function draws extra borders on nodes in a
specified plane. The color of the borders may be changed by clicking
them and a color palette. The function returns a matrix accordnig to
the classification. Unfortunately som_manualclassify still lacks it
counterpartner in the struct level functions, see below.
EXAMPLES:
- handle1=som_plane('hexa',rand(10,15),rand(10,15))
produces a hexagonal random plane with random gaps on
the current axis.
- handle2=som_planeL('hexa',[[2 1];[3 1]],'abc');
labels nodes (2,1) and (3,1) with text 'abc'
- set(handle2,'Color','red');
changes the font color of the labels to red. See the Matlab's User Guide
for help on object properties
- delete(handle2)
deletes labels.
- handle3=findobj(1,'Tag','Hit')
searches for the hit marks produced by the function som_planeH in
figure object number 1.
Struct-level functions and utilities
The struct-level visualization works in two phases. First, a background
image is drawn with som_show and then different kinds
of information can be added on top of it with som_add*: hit
histograms, labels and trajectories. The add-on functions return a
vector of graphics handles, so they can be easily manipulated (set)
or removed (delete).
Add-on visualizations may be removed with the som_clear
function.
The figure object drawn by som_show has colorbars whose
scaling may be redone in several manners by the som_recolorbar
tool. With using this function the original data scaling may be
restored for visualization purposes.
The map can also be visualized by projecting its weight vectors to
a lower dimension. Sammon's mapping is an often used projection
technique. Using it the shape of the SOM is visualized more
efficiently than with the u-matrix. What is bad about Sammon's mapping
is that it is pretty heavy to calculate. Other projection methods
include PCA (Principal Component Analysis; linear projection) and
CCA (Curvilinear Component Analysis; nonlinear projection).
The function som_projection can be used to calculate
projections of maps and data sets and the function som_showgrid
can be used to visualize map projections.
FUNCTIONS:
- som_show
- The big brother of som_plane. A map structure is
visualized: multiple planes and colorbars are presented
in the same figure. (Note: the UserData object property field of
the figure is reserved for SOM Toolboxes use.)
som_show can visualize component planes, u-matrixes and
empty planes, which are convenient for labeling and hit
marking.
- som_addlabels, som_addhits, som_addtraj
- Auxiliary tools for labeling etc. the planes made by som_show.
Note: These functions can be used only on the figures
created by som_show.
- som_recolorbar
- Can be used to refresh the colorbars in a visualization after colormap
changere and to rescale the tick scaling in the. Note: This function
can be used only on the figures created by som_show.
- som_showtitle
- Plots a movable info text onto a figure produced by som_show. Requires
the subfunction som_showtitleButtonDownFcn.
- som_clear
- This tool is to clear labels from specified subplots
even without knowing the object handles.
- som_umat
- Calculates the U-matrix of a map.
- som_projection, som_sammon, som_cca, som_pca
- These functions are used to project maps and data sets to a lower
dimension for visualization with som_showgrid
- som_showgrid
- Visualizes the grid of the given map, or the given data
matrix. Especially used to visualize map projections.
- som_profile
- Visualizes the model vectors of a given map. It uses a conventional
visualization method, eg. pie diagram or bar plot, to present one
model vector. The plots of model vectors are organized according to the
map grid. The function is quite slow for big maps.
EXAMPLES:
- handle1=som_show(sMap, [1 2 3],'denormalized');
Draw component planes 1,2 and 3 from sMap. Use original
data scaling in colorbars.
- sMap=som_autolabel(sMap,sData); som_addlabels(sMap, 'all', 3);
Label the 3rd subplot in the figure with the labels in the sMap.
- handle2=findobj('Tag','Lab'); set(handle2, 'PointSize', 20);
Enlarge the font size afterwards.
som_addtraj tags the objects with the string 'Traj'.
- sam=som_sammon(sMap, 3, 50); som_showgrid(sMap, sam);
First calculate a 3D Sammon's projection and then visualize it.
- som_showgrid(sMap);
Show the map grid of the map.
- som_profile(sMap,'BAR_AXIS_OFF');
Show the model vectors using bar plot with no axes.
Graphical User Interface
Graphical user interface is an additional component of the SOM
Toolbox. It is provided to make both construction and visualization of
maps easier. The downfall is that the GUIs are not as flexible as
command line functions, and so experienced users will no doubt rather
use the Toolbox from the command line.
Initialization and training
| One tool is used to offer the multitude of initialization and
training options to the user.
|
Related functions: somui_it.m
|
Visualization
| Also visualization is handled with a single tool.
|
Related functions: somui_vis.m
|
somtlbx@mail.cis.hut.fi
Last modified: Fri Dec 19 14:02:51 EET 1997