Self-Organizing Map (SOM), County data
Introduction
This document contains the information regarding and instructions for th SOM based project
work. This assignment will get you acquainted with the Self-Organizing Map (SOM), show you
where the SOM is applicable, and how it works. To make this work interesting, you shall study
real world data, the similarities and differences between the counties of Finland. A
secondary objective of this work is to guide you to create an understandable project
document. The first part of this document describes the phases of this project, the second
the requirements of the project report, the third some technical aspects and the last part
the conclusions.
The phases of the work
-
First, you should prepare yourself with this environment:
-
MATLAB. On UNIX platforms, you may need the 'use matlab' -command
-
SOM-toolbox for the MATLAB. This toolbox is available at the
Maarintalo jewel-machines of the HUT. To use the toolbox, you must
enter addpath /p/appl/math/matlab/hut/SOM to use the path. It
is also available at http://www.cis.hut.fi/projects/somtoolbox/
,
from where it can be downloaded and installed free of charge.
-
Then you should familiarize yourself with the SOM-algorithm
-
Run the som_demo1 script of SOM-toolbox to get an idea of the SOM
works.
-
More theoretical approaches are available at http://www.cis.hut.fi/projects/somtoolbox/theory/
-
A simple approach is to think that the SOM is an elastic graph. The
nodes are attracted to the data points and the arcs are rubber bands
trying to hold the graph together...
-
Third, do the required programming tasks
-
Use the basescript as the basis for
modifications
-
Modify the three parameters stated at the beginning of the script and
study how the behavior changes
-
Use this script as a basis to run the SOM on the
data of the counties of Finland. You will need the data-file kunta.mat. Save it to your Matlab-work directory
with the name kunta.mat. And you will need the documentation of the variables. The script file
contains some basic operations and comments on what is required to be
done, read through it for the details. If you have any problem downloading
the data, for example matlab says it is not binary, please try to download either a
gzipped or zipped version, as most
browsers are very likely to recognize the extensions as binary and download the file correctly even if they
don't do that for .mat files.
-
Fourth, report your ideas and discoveries.
The project report
-
Create a cover sheet with
-
course code and course name.
-
your own name, department, email-address and student number
-
the date when you returned the report.
-
The first section should be an introduction stating
-
what is the work described in this report.
-
what background is required to understand it
-
how it will be described
-
The second section should be an introduction to the SOM algorithm
-
with few hundred words, given an overview of the SOM algorithm
-
then, with one or two pages, describe how the parameters of the SOM
influence the result of the training (as can be seen from the
basescript). Organize the document and give the titles as you see
appropriate.
-
The third section should also contain one or two pages describing
-
based on the U-matrix, how many county types we have in Finland
-
a qualitative description of these counties based on the above county
count and the component planes. Give an example of a county belonging
to each county type. This county may be real or invented.
-
perform a projection of the map and data with one of the following
mechanisms, based on the more detailed instructions in the script
file. The main idea here is to see how the map "streches", as this
cannot be really observed in the high-dimensional space.
Select the projection method based on the second-to-last number
of your student id (ie. 4 in 12345a). The projections are PCA
(Principal Component Analysis), Sammon's mapping and CCA (Curvilinear
Component Analysis). So if your number is
- 1-4 you perform Principal Component Analysis on both the SOM
created and the data and plot them in the same picture and have a
look. Explain what effects you see. Also tell how much has been
"explained" (as in the ratio of the variations along the components
seen versus all components). You might want to use the function
princomp, which makes it easy to also plot the data.
- 5-7 you perform Sammons's mapping. Briefly explain the main idea
(what is the "criteria" for the projection) and do the plot. Explain
what you see. The SOM toolbox includes the function sammon that
is capable of performing this mapping.
- 8-0 you perform CCA. Briefly explain the main idea (what is the
"criteria" for the projection) and do the plot. Explain what you
see. The function cca might be of some assistance.
Update 17.5.2006: Please note that yet another problem has been discovered with latest versions of
matlab: If you are getting and error message
along the lines of ""error" previously appeared to be used as a function or command, conflicting with its use here as the name
of a variable.", or display issues, please look to the project main page for more information on workarounds -MA
Of course, if you are interested, you are free to do all the plottings
and examine the differences, but this is by no means required for the
report to be accepted (and since it doesn't get graded beyond
accepted/failed it'll only be for your own information... :) )
(Note: these are all projections from a higher dimension, so they
don't give an "entirely accurate" picture, but it will make examining
the SOM structure much more intuitive to see it in a plottable amount
of dimensions!)
-
use pictures as you see fit, but do use black-and-white pictures to
save the ink of the printers. The picture can be transformed to
black-and-white using the MATLAB-command 'colormap(gray)'.
-
The last section is for conclusions
-
state what was the major result of your study by picking the best
items of the above treatment
-
say what you would study next. For example, what data you would like
to study - the countries of the world?
- Remember to include any matlab (.m) files you wrote (or edited)
for this assignment in your submission (as an appendix, for example)!
The required tools
-
The MATLAB-software.
-
The SOM-Toolbox for MATLAB. This toolbox is a library of MATLAB-codes
designed to calculate and visualize the SOMs. It is available at a
number of computers of the HUT, at least the 'jewel'-workstations
physically located in Maarintalo. If you want to do the work there,
you must use 'addpath /p/appl/math/matlab/hut/SOM' to take the package
into use. If necessary, it can also be downloaded from http://www.cis.hut.fi/projects/somtoolbox/download/.
Use the latest version.
- If you are unfamiliar with the projection methods (PCA, Sammon's
mapping, CCA), you can find a brief description of the latter two on
the UCL
Neural Network Group CDA-page. This might clarify them up a
bit. A good explanation of PCA can be found under Jaakko
Hollmen's Masters thesis - A lot of information can be found with
a simple web search, if you do not have a book handy!
-
Consider the assistant also a resource. The course assistant, Matti Aksela can help you with
this project. Also available to you is Mr. Sampsa Laine, the creator
of this document.
Conclusions
This work has two objectives, familiarization with the SOM and the creation
of an understandable project report. Do the study with gusto. The reports
are graded by the content and clarity of presentation. According to F.
Pappa 'Vain oleellinen on tärkeää'.(='Only the essential
is of importance').

http://www.cis.hut.fi/teaching/T-61.3030/harjtyo/newsom/soma.shtml
matti.aksela@hut.fi
Wednesday, 17-May-2006 23:28:57 EEST
|