The Self-Organizing Map (SOM), developed by professor Kohonen, is one of the most popular neural network models. The SOM algorithm is based on unsupervised competitive learning, which means that the training is entirely data-driven and that the neurons of the map compete with each other. Supervised algorithms, like the Multi-Layered Perceptron (MLP), require that the target values for each data vector are known, but the SOM does not have this limitation. Since its introduction in 1981 the SOM has been applied in a large variety of tasks ranging from machine vision to full-text analysis and from process control to neurophysiological research .
Knowledge discovery in databases (KDD) is an emerging area of research in artificial intelligence and information management. The purpose of KDD is to find new knowledge from databases in which the dimension, complexity or the amount of data has so far been prohibitively large for human observation alone. Some typical tasks of KDD are classification, regression, clustering, summarization and dependency modeling. The algorithms that are employed in these tasks include decision trees and rules, nonlinear regression and classification mehthods like feed-forward networks and adaptive splines, example based methods like k-Nearest Neighbors (kNN), graphical dependency models and relational learning .
This work was inspired by the great potential of the SOM in knowledge discovery, especially in its exploratory phase, data mining. The various benefits the SOM offers include an approximation of the probability density function of the training data, prototype vectors best describing the data and a highly visualized approach to investigating the data: using SOM it is very easy to see rough relations between features and, using trajectories, the behaviour of the underlying system over time.
Although KDD is a relatively new area two things are already quite clear. Firstly, in KDD humans have the key role. In foreseable future the methods do not have a chance to become truely autonomous systems. Secondly, no single method has proven versatile, yet efficient enough to be successful on its own in the legion of different kinds of tasks and situations rising in KDD. Instead a data mining environment is needed in which the user is able to try and retry different kinds of approaches and gain insights about the data .
This work was a part of a cooperative project, Entire, between Jaakko Pöyry Consulting (JPC) and the Laboratory of Computer and Information Science at the Helsinki University of Technology. The goal of the project was to implement a data mining tool based on the SOM and to use this tool to form a comprehensive view of the world pulp and paper industry. The work is divided into four main chapters: