Data Mining Techniques Based
on the Self-Organizing Map

Thesis for the degree of Master of Science in Engineering in the Helsinki University of Techonology at the Department of Engineering Physics and Mathematics by Juha Vesanto.

Abstract

Keywords: neural network, Self-Organizing Map, data mining, knowledge discovery, pulp and paper industry

Data mining is a part of a larger area of recent research in artificial intelligence and information management: knowledge discovery in databases (KDD). The purpose of KDD is to find new knowledge from databases in which the dimension, complexity or the amount of data has so far been prohibitively large for human observation alone. Data mining refers to the exploratory phase of knowledge discovery.

The Self-Organizing Map (SOM) is one of the most popular neural network models. The SOM quantizes the data space formed by the training data and simultaniously performs a topology-preserving projecting of the data space on a regular two-dimensional grid. The SOM also has excellent visualization capabilities including techniques to give an informative picture of the data space, and techniques to compare data vectors or whole data sets with each other. The SOM can also be used for clustering, classification and modeling. The versatile properties of the SOM make it a valuable tool in data mining and knowledge discovery.

As part of this work a SOM-based data mining tool was implemented. The methods and tools presented in the work were used to analyze the pulp and paper industry worldwide and the Scandinavian industry in more detail with encouraging results. The analysis of technological data resulted in 20 major types of pulp and paper mills. Regarding Scandinavian industry a hierarchical structure of SOMs was used to combine technological, environmental and economical data.

The work has been done in the Laboratory of Computer and Information Science at the Helsinki University of Technology as part of the corporate project Entire in the technology program "Adaptive and Intelligent Systems Applications". The project was financed by Jaakko Pöyry Consulting and the Technology Development center of Finland (TEKES).

Author: Juha Vesanto
Title: Data Mining Techniques Based on the Self-Organizing Map
Date: May 26th 1997
Number of pages: 63
School: Helsinki University of Technology
Department: Engineering Physics and Mathematics
Professorship: Tik-61, Computer and Information Science
Supervisor: Professor Olli Simula
Instructor: Licenciate in Technology Petri Vasara


Last updated 27th May '97 by Juha Vesanto