Self-Organizing Map (SOM),Food data

Introduction

This document contains the information regarding and instructions for th SOM based project work. This assignment will get you acquainted with the Self-Organizing Map (SOM), show you where the SOM is applicable, and how it works. To make this work interesting, you shall study real world data, a food database. A secondary objective of this work is to guide you to create an understandable project document. The first part of this document describes the phases of this project, the second the requirements of the project report, the third some technical aspects and the last part the conclusions.

The phases of the work

  1. First, you should prepare yourself with this environment:
    • MATLAB. On UNIX platforms, you may need the 'use matlab' -command.
    • SOM-toolbox for the MATLAB. This toolbox is available at the Maarintalo jewel-machines of the HUT. To use the toolbox, you must enter addpath /p/appl/math/matlab/hut/SOM to use the path. It is also available at http://www.cis.hut.fi/projects/somtoolbox/
    • , from where it can be downloaded and installed free of charge.
  2. Then you should familiarize yourself with the SOM-algorithm
    • Run the som_demo1 script of SOM-toolbox to get an idea of the SOM works.
    • More theoretical approaches are available at http://www.cis.hut.fi/projects/somtoolbox/theory/
    • A simple approach is to think that the SOM is an elastic graph. The nodes are attracted to the data points and the arcs are rubber bands trying to hold the graph together...
  3. Third, do the required programming tasks
    • Use the basescript as the basis for modifications
    • Modify the three parameters stated at the beginning of the script and study how the behavior changes
    • Use this script as a basis to run the SOM on the data of the foods. You will need the data-file food.mat. Save it to your Matlab-work directory with the name food.mat. And you will need the documentation of the variables. The script file contains some basic operations and comments on what is required to be done, read through it for the details.
  4. Fourth, report your ideas and discoveries.

The project report

  1. Create a cover sheet with
    • course code and course name.
    • your own name, department, email-address and student number
    • the date when you returned the report.
  2. The first section should be an introduction stating
    • what is the work described in this report.
    • what is the interesting result
    • what backround is required to understand it
    • how it will be described
  3. The second section should be an introduction to the SOM algorithm
    • with few hundred words, given an overview of the SOM algorithm
    • then, with one or two pages, describe how the parameters of the SOM influence the result of the training (as can be seen from the basescript). Organize the document and give the titles as you see appropriate.
  4. The third section should also contain one or two pages describing
    • based on the U-matrix, how many "food types" (for example fruit etc.) can be identified?
    • give a qualitative description of these "food types" based on the above count and the component planes. Give an example of an actual food belonging to each type. This food may be real or invented.
    • perform a projection of the map and data with one of the following mechanisms, based on the more detailed instructions in the script file. The main idea here is to see how the map "streches", as this cannot be really observed in the high-dimensional space. Select the projection method based on the second-to-last number of your student id (ie. 4 in 12345a). The projections are PCA (Principal Component Analysis), Sammon's mapping and CCA (Curvilinear Component Analysis). So if your number is
      • 1-4 you perform Principal Component Analysis on both the SOM created and the data and plot them in the same picture and have a look. Explain what effects you see. Also tell how much has been "explained" (as in the ratio of the variations along the components seen versus all components). You might want to use the function princomp, which makes it easy to also plot the data.
      • 5-7 you perform Sammons's mapping. Briefly explain the main idea (what is the "criteria" for the projection) and do the plot. Explain what you see. The SOM toolbox includes the function sammon that is capable of performing this mapping.
      • 8-0 you perform CCA. Briefly explain the main idea (what is the "criteria" for the projection) and do the plot. Explain what you see. The function cca might be of some assistance.

      Update 17.5.2006: Please note that yet another problem has been discovered with latest versions of matlab: If you are getting and error message along the lines of ""error" previously appeared to be used as a function or command, conflicting with its use here as the name of a variable.", or display issues, please look to the project main page for more information on workarounds -MA

      Of course, if you are interested, you are free to do all the plottings and examine the differences, but this is by no means required for the report to be accepted (and since it doesn't get graded beyond accepted/failed it'll only be for your own information... :) ) (Note: these are all projections from a higher dimension, so they don't give an "entirely accurate" picture, but it will make examining the SOM structure much more intuitive to see it in a plottable amount of dimensions!)
    • use pictures as you see fit, but do use black-and-white pictures to save the ink of the printers. The picture can be transformed to black-and-white using the MATLAB-command 'colormap(gray)'.
  5. The last section is for conclusions
    • state what was the major result of your study by picking the best items of the above treatment
    • say what you would study next. For example, what data you would like to study...
    • Remember to include any matlab (.m) files you wrote (or edited) for this assignment in your submission (as an appendix, for example)!

The required tools

  • The MATLAB-software.
  • The SOM-Toolbox for MATLAB. This toolbox is a library of MATLAB-codes designed to calculate and visualize the SOMs. It is available at a number of computers of the HUT, at least the 'jewel'-workstations physically located in Maarintalo. If you want to do the work there, you must use 'addpath /p/appl/math/matlab/hut/SOM' to take the package into use. If necessary, it can also be downloaded from http://www.cis.hut.fi/projects/somtoolbox/download/. Use the latest version.
  • If you are unfamiliar with the projection methods (PCA, Sammon's mapping, CCA), you can find a brief description of the latter two on the UCL Neural Network Group CDA-page. This might clarify them up a bit. A good explanation of PCA can be found under Jaakko Hollmen's Masters thesis - A lot of information can be found with a simple web search, if you do not have a book handy!
  • Consider the assistant also a resource. The course assistant, Matti Aksela can help you with this project. Also available to you is Mr. Sampsa Laine, the creator of this document.

Conclusions

This work has two objectives, familiarization with the SOM and the creation of an understandable project report. Do the study with gusto. The reports are graded by the content and clarity of presentation. According to F. Pappa 'Vain oleellinen on tärkeää'.(='Only the essential is of importance').



http://www.cis.hut.fi/Opinnot/T-61.261/harjtyo/newsom/somb.shtml
matti.aksela@hut.fi
Wednesday, 17-May-2006 23:28:37 EEST