Self-Organizing Map (SOM),Food data
This document contains the information regarding and instructions for
th SOM based project work. This assignment will get you acquainted
with the Self-Organizing Map (SOM), show you where the SOM is applicable,
and how it works. To make this work interesting, you shall study real world
data, a food database. A secondary
objective of this work is to guide you to create an understandable project
document. The first part of this document describes the phases of this
project, the second the requirements of the project report, the third some
technical aspects and the last part the conclusions.
The phases of the work
First, you should prepare yourself with this environment:
Then you should familiarize yourself with the SOM-algorithm
MATLAB. On UNIX platforms, you may need the 'use matlab' -command.
SOM-toolbox for the MATLAB. This toolbox is available at the
Maarintalo jewel-machines of the HUT. To use the toolbox, you must
enter addpath /p/appl/math/matlab/hut/SOM to use the path. It
is also available at http://www.cis.hut.fi/projects/somtoolbox/,
from where it can be downloaded and installed free of charge.
Third, do the required programming tasks
Run the som_demo1 script of SOM-toolbox to get an idea of the SOM works.
More theoretical approaches are available at http://www.cis.hut.fi/projects/somtoolbox/theory/
A simple approach is to think that the SOM is an elastic graph. The nodes
are attracted to the data points and the arcs are rubber bands trying to hold
the graph together...
Fourth, report your ideas and discoveries.
Use the basescript as the basis for modifications
Modify the three parameters stated at the beginning of the script and study
how the behavior changes
Use this script as a basis to run the SOM on the
data of the foods. You will need the data-file food.mat.
Save it to your Matlab-work directory with the name food.mat. And you
will need the documentation of the variables.
The script file contains some basic operations and comments on what is required to be done, read
through it for the details.
The project report
Create a cover sheet with
The first section should be an introduction stating
course code and course name.
your own name, department, email-address and student number
the date when you returned the report.
The second section should be an introduction to the SOM algorithm
what is the work described in this report.
what is the interesting result
what backround is required to understand it
how it will be described
The third section should also contain one or two pages describing
with few hundred words, given an overview of the SOM algorithm
then, with one or two pages, describe how the parameters of the SOM influence
the result of the training (as can be seen from the basescript). Organize the document and give the titles as
you see appropriate.
The last section is for conclusions
based on the U-matrix, how many "food types" (for example fruit etc.) can be identified?
give a qualitative description of these "food types" based on the above count
and the component planes. Give an example of an actual food belonging to each
type. This food may be real or invented.
perform a projection of the map and data with one of the following mechanisms,
based on the more detailed instructions in the script file. The main
idea here is to see how the map "streches", as this cannot be really
observed in the high-dimensional space.
Select the projection method based on the second-to-last number
of your student id (ie. 4 in 12345a). The projections are PCA
(Principal Component Analysis), Sammon's mapping and CCA (Curvilinear
Component Analysis). So if your number is
- 1-4 you perform Principal Component Analysis on both the SOM
created and the data and plot them in the same picture and have a
look. Explain what effects you see. Also tell how much has been
"explained" (as in the ratio of the variations along the components
seen versus all components). You might want to use the function
princomp, which makes it easy to also plot the data.
- 5-7 you perform Sammons's mapping. Briefly explain the main idea
(what is the "criteria" for the projection) and do the plot. Explain
what you see. The SOM toolbox includes the function sammon that
is capable of performing this mapping.
- 8-0 you perform CCA. Briefly explain the main idea (what is the
"criteria" for the projection) and do the plot. Explain what you
see. The function cca might be of some assistance.
Update 17.5.2006: Please note that yet another problem has been discovered with latest versions of
matlab: If you are getting and error message
along the lines of ""error" previously appeared to be used as a function or command, conflicting with its use here as the name
of a variable.", or display issues, please look to the project main page for more information on workarounds -MA
Of course, if you are interested, you are free to do all the plottings
and examine the differences, but this is by no means required for the
report to be accepted (and since it doesn't get graded beyond
accepted/failed it'll only be for your own information... :) )
(Note: these are all projections from a higher dimension, so they
don't give an "entirely accurate" picture, but it will make examining
the SOM structure much more intuitive to see it in a plottable amount
use pictures as you see fit, but do use black-and-white pictures to save
the ink of the printers. The picture can be transformed to black-and-white
using the MATLAB-command 'colormap(gray)'.
state what was the major result of your study by picking the best items
of the above treatment
say what you would study next. For example, what data you would like to
- Remember to include any matlab (.m) files you wrote (or edited)
for this assignment in your submission (as an appendix, for example)!
The required tools
The SOM-Toolbox for MATLAB. This toolbox is a library of MATLAB-codes
designed to calculate and visualize the SOMs. It is available at a
number of computers of the HUT, at least the 'jewel'-workstations
physically located in Maarintalo. If you want to do the work there,
you must use 'addpath /p/appl/math/matlab/hut/SOM' to take the package
into use. If necessary, it can also be downloaded from http://www.cis.hut.fi/projects/somtoolbox/download/.
Use the latest version.
- If you are unfamiliar with the projection methods (PCA, Sammon's
mapping, CCA), you can find a brief description of the latter two on
Neural Network Group CDA-page. This might clarify them up a
bit. A good explanation of PCA can be found under Jaakko
Hollmen's Masters thesis - A lot of information can be found with
a simple web search, if you do not have a book handy!
Consider the assistant also a resource. The course assistant, Matti
Aksela can help you with this project. Also available to you is Mr. Sampsa
Laine, the creator of this document.
This work has two objectives, familiarization with the SOM and the creation
of an understandable project report. Do the study with gusto. The reports
are graded by the content and clarity of presentation. According to F.
Pappa 'Vain oleellinen on tärkeää'.(='Only the essential
is of importance').
Wednesday, 17-May-2006 23:28:37 EEST