Frequently Asked Questions

General

Q
What is SOM Toolbox ?
A
A software library for Matlab 5 implementing the Self-Organizing Map (SOM) algorithm.

Q
What is Matlab 5 ?
A
Version 5 of the popular scientific computing environment Matlab. Check out their homepages from http://www.mathworks.com

Q
Who is the Toolbox meant for ?
A
SOM Toolbox is meant for people interested in using the Self-Organizing Map in their research/development projects. To use it, some basic knowledge of the SOM algorithm is required to understand what the results mean. The toolbox has been built modular so that you can build your own functions on top of it and/or modify the existing functions to suit your needs. Nothing prevents its usage in education, either. We have tried to make the Toolbox easy to use also for beginners (e.g. functions som_doit.m and som_gui.m).

Q
How does this relate to the Neural Networks toolbox ?
A
The Neural Networks toolbox tries to cover the whole field of neural networks, of which the self-organizing maps is just a part.
The Neural Networks toolbox also contains an implementation of the Self-Organizing Map algorithm. However the implementation is neither flexible, efficient or really up to the state of the art. If you want to use self-organizing maps, we recommend using SOM Toolbox instead.

Q
What does the SOM Toolbox cost ?
A
Nothing. It's free.
Note, however, that it's also copyrighted, so you can use it but not sell it (or even parts of it) onward as your own. If you would like to make a commercial product partly based on the SOM Toolbox, contact us and we'll consider it. Feel free to use it as part of any non-commercial product, as long as you remember to keep our copyright notices intact.

Q
When will the next version come out ?
A
Er... someday... perhaps. The current version is 1.0beta, and we may someday release version 1.0, but probably not very soon. We'll be correcting bugs and maybe adding some new functions, but don't hold your breath waiting it to happen.

Q
What kind of support do you offer ?
A
In this early phase, we'll try to fix any serious bugs you might find and offer answers to some basic questions, but eventually we'll start referring everyone to the FAQ and HOW-TO lists. For the moment though, your questions and comments are very welcome!

Q
What kind of environment do I need ?
A
The SOM Toolbox will (or it should) run anywhere where Matlab version 5 runs. Unfortunately this rules out Windows 3.1 and DOS environments. As the algorithms are pretty heavy and do not spare memory, we recommend as fast processor and as much memory as you can get. A 486 processor is sufficient, although a...bit...slow. Anyway, try it out and see for yourself.

Q
I have this algorithm that would be a great addition to the Toolbox. What do I do ?
A
You have? Great! We have a separate contrib area in the Toolbox reserved for just this kind of contributions. You can retain your own copyright when you contribute something. To contribute, just send your algorithm along with possible documentation and copyright notices to somtlbx@mail.cis.hut.fi.

Handling data

Q
What should I do with categorial data ?
A
Categorial data is something that needs special tricks, because euclidian norm isn't applicable as a "distance" measure between categories.
One solution is to use the one-of-C scheme: assuming you have C categories, add C components to the data vectors so that each of the new components corresponds to one category. For each data vector assign value 1 to the component corresponding to its category and let all other "category-components" be zero.

Q
Which normalization method should I use ?
A
Of course it depends on your data and what you consider as important. By default the 'som_var_norm' is used, which scales each component to unit variance and zero mean. This is used to make sure each component has approximately equal influence to training. If it is important to separate value ranges with a lot of samples with more precision than value ranges with only a few samples, use 'som_hist_norm'.

Map training

Q
What kind of training parameters should I use ?
A
If you are not sure what kind of parameters to use, trust the defaults. If you want to optimize, try varying the parameters: try both initialization algorithms, train longer, try a few other neighborhood widths (both initial and final), and the 'inv' learning rate type.

Visualization

Q
How come visualization doesn't work for higher than 2-dim map grids ?
A
Well, that kind of visualization might be done using slices. But it wouldn't be terribly illustrative, so we decided to leave that out. Later on, we might add some kind of visualization tools for 3D map grids, but don't count on it.


somtlbx@mail.cis.hut.fi