Next: The Self-Organizing Map Up: No Title Previous: Contents

Introduction

The Self-Organizing Map (SOM), developed by professor Kohonen, is one of the most popular neural network models. The SOM algorithm is based on unsupervised competitive learning, which means that the training is entirely data-driven and that the neurons of the map compete with each other. Supervised algorithms, like the Multi-Layered Perceptron (MLP), require that the target values for each data vector are known, but the SOM does not have this limitation. Since its introduction in 1981 the SOM has been applied in a large variety of tasks ranging from machine vision to full-text analysis and from process control to neurophysiological research [21].

Knowledge discovery in databases (KDD) is an emerging area of research in artificial intelligence and information management. The purpose of KDD is to find new knowledge from databases in which the dimension, complexity or the amount of data has so far been prohibitively large for human observation alone. Some typical tasks of KDD are classification, regression, clustering, summarization and dependency modeling. The algorithms that are employed in these tasks include decision trees and rules, nonlinear regression and classification mehthods like feed-forward networks and adaptive splines, example based methods like k-Nearest Neighbors (kNN), graphical dependency models and relational learning [8].

This work was inspired by the great potential of the SOM in knowledge discovery, especially in its exploratory phase, data mining. The various benefits the SOM offers include an approximation of the probability density function of the training data, prototype vectors best describing the data and a highly visualized approach to investigating the data: using SOM it is very easy to see rough relations between features and, using trajectories, the behaviour of the underlying system over time.

Although KDD is a relatively new area two things are already quite clear. Firstly, in KDD humans have the key role. In foreseable future the methods do not have a chance to become truely autonomous systems. Secondly, no single method has proven versatile, yet efficient enough to be successful on its own in the legion of different kinds of tasks and situations rising in KDD. Instead a data mining environment is needed in which the user is able to try and retry different kinds of approaches and gain insights about the data [8].

This work was a part of a cooperative project, Entire, between Jaakko Pöyry Consulting (JPC) and the Laboratory of Computer and Information Science at the Helsinki University of Technology. The goal of the project was to implement a data mining tool based on the SOM and to use this tool to form a comprehensive view of the world pulp and paper industry. The work is divided into four main chapters:

In chapter 2 the basic Self-Organizing Map algorithm and its properties are presented. The chapter is mainly based on the ``Self-Organizing Maps''-book by professor Kohonen [21].
The chapter 3 details different kinds of knowledge discovery methods using the SOM. These have been divided into four basic types: data visualization, map measures, clustering/classification and modeling.
The chapter 4 presents a data mining tool, ENTIRE, which utilizes the SOM. The presentation will give a view of what kind of properties are essential when a SOM-based data mining tool is implemented.
In chapter 5 the SOM is applied in analyzing the world pulp and paper industry. Primary focus is on analyzing the technology of pulp and paper mills and secondary focus is on constructing a comprehensive view of the industry taking technological, economical and environmental data into account.

Next: The Self-Organizing Map Up: No Title Previous: Contents

Juha Vesanto
Tue May 27 12:40:37 EET DST 1997