SelfOrganizing Map (SOM),Food data
Introduction
This document contains the information regarding and instructions for
th SOM based project work. This assignment will get you acquainted
with the SelfOrganizing Map (SOM), show you where the SOM is applicable,
and how it works. To make this work interesting, you shall study real world
data, a food database. A secondary
objective of this work is to guide you to create an understandable project
document. The first part of this document describes the phases of this
project, the second the requirements of the project report, the third some
technical aspects and the last part the conclusions.
The phases of the work

First, you should prepare yourself with this environment:

MATLAB. On UNIX platforms, you may need the 'use matlab' command.

SOMtoolbox for the MATLAB. This toolbox is available at the
Maarintalo jewelmachines of the HUT. To use the toolbox, you must
enter addpath /p/appl/math/matlab/hut/SOM to use the path. It
is also available at http://www.cis.hut.fi/projects/somtoolbox/
,
from where it can be downloaded and installed free of charge.

Then you should familiarize yourself with the SOMalgorithm

Run the som_demo1 script of SOMtoolbox to get an idea of the SOM works.

More theoretical approaches are available at http://www.cis.hut.fi/projects/somtoolbox/theory/

A simple approach is to think that the SOM is an elastic graph. The nodes
are attracted to the data points and the arcs are rubber bands trying to hold
the graph together...

Third, do the required programming tasks

Use the basescript as the basis for modifications

Modify the three parameters stated at the beginning of the script and study
how the behavior changes

Use this script as a basis to run the SOM on the
data of the foods. You will need the datafile food.mat.
Save it to your Matlabwork directory with the name food.mat. And you
will need the documentation of the variables.
The script file contains some basic operations and comments on what is required to be done, read
through it for the details.

Fourth, report your ideas and discoveries.
The project report

Create a cover sheet with

course code and course name.

your own name, department, emailaddress and student number

the date when you returned the report.

The first section should be an introduction stating

what is the work described in this report.

what is the interesting result

what backround is required to understand it

how it will be described

The second section should be an introduction to the SOM algorithm

with few hundred words, given an overview of the SOM algorithm

then, with one or two pages, describe how the parameters of the SOM influence
the result of the training (as can be seen from the basescript). Organize the document and give the titles as
you see appropriate.

The third section should also contain one or two pages describing

based on the Umatrix, how many "food types" (for example fruit etc.) can be identified?

give a qualitative description of these "food types" based on the above count
and the component planes. Give an example of an actual food belonging to each
type. This food may be real or invented.

perform a projection of the map and data with one of the following mechanisms,
based on the more detailed instructions in the script file. The main
idea here is to see how the map "streches", as this cannot be really
observed in the highdimensional space.
Select the projection method based on the secondtolast number
of your student id (ie. 4 in 12345a). The projections are PCA
(Principal Component Analysis), Sammon's mapping and CCA (Curvilinear
Component Analysis). So if your number is
 14 you perform Principal Component Analysis on both the SOM
created and the data and plot them in the same picture and have a
look. Explain what effects you see. Also tell how much has been
"explained" (as in the ratio of the variations along the components
seen versus all components). You might want to use the function
princomp, which makes it easy to also plot the data.
 57 you perform Sammons's mapping. Briefly explain the main idea
(what is the "criteria" for the projection) and do the plot. Explain
what you see. The SOM toolbox includes the function sammon that
is capable of performing this mapping.
 80 you perform CCA. Briefly explain the main idea (what is the
"criteria" for the projection) and do the plot. Explain what you
see. The function cca might be of some assistance.
Update 17.5.2006: Please note that yet another problem has been discovered with latest versions of
matlab: If you are getting and error message
along the lines of ""error" previously appeared to be used as a function or command, conflicting with its use here as the name
of a variable.", or display issues, please look to the project main page for more information on workarounds MA
Of course, if you are interested, you are free to do all the plottings
and examine the differences, but this is by no means required for the
report to be accepted (and since it doesn't get graded beyond
accepted/failed it'll only be for your own information... :) )
(Note: these are all projections from a higher dimension, so they
don't give an "entirely accurate" picture, but it will make examining
the SOM structure much more intuitive to see it in a plottable amount
of dimensions!)

use pictures as you see fit, but do use blackandwhite pictures to save
the ink of the printers. The picture can be transformed to blackandwhite
using the MATLABcommand 'colormap(gray)'.

The last section is for conclusions

state what was the major result of your study by picking the best items
of the above treatment

say what you would study next. For example, what data you would like to
study...
 Remember to include any matlab (.m) files you wrote (or edited)
for this assignment in your submission (as an appendix, for example)!
The required tools

The MATLABsoftware.

The SOMToolbox for MATLAB. This toolbox is a library of MATLABcodes
designed to calculate and visualize the SOMs. It is available at a
number of computers of the HUT, at least the 'jewel'workstations
physically located in Maarintalo. If you want to do the work there,
you must use 'addpath /p/appl/math/matlab/hut/SOM' to take the package
into use. If necessary, it can also be downloaded from http://www.cis.hut.fi/projects/somtoolbox/download/.
Use the latest version.
 If you are unfamiliar with the projection methods (PCA, Sammon's
mapping, CCA), you can find a brief description of the latter two on
the UCL
Neural Network Group CDApage. This might clarify them up a
bit. A good explanation of PCA can be found under Jaakko
Hollmen's Masters thesis  A lot of information can be found with
a simple web search, if you do not have a book handy!

Consider the assistant also a resource. The course assistant, Matti
Aksela can help you with this project. Also available to you is Mr. Sampsa
Laine, the creator of this document.
Conclusions
This work has two objectives, familiarization with the SOM and the creation
of an understandable project report. Do the study with gusto. The reports
are graded by the content and clarity of presentation. According to F.
Pappa 'Vain oleellinen on tärkeää'.(='Only the essential
is of importance').
http://www.cis.hut.fi/Opinnot/T61.261/harjtyo/newsom/somb.shtml
matti.aksela@hut.fi
Wednesday, 17May2006 23:28:37 EEST
