The Self-Organizing Map (SOM) is a versatile tool for exploring data sets. It is an effective clustering method and it has excellent visualization capabilities including techniques which use the weight vectors of the SOM to give an informative picture of the data space, and techniques which use data projections to compare data vectors or whole data sets with each other. The visualization capabilities of the SOM make it a valuable tool in data summarization and in consolidating the discovered knowledge. The SOM can also be used for regression and modeling or as a preprocessing stage for other methods.
As part of this work a prototype of a data mining tool was implemented. The ENTIRE program is a much-needed improvement in usability over the command based program package SOM_PAK . However to really make use of the capabilities of the SOM such a tool should be integrated as part of an existing data mining/computing environment such as Matlab by MathWorks, Inc . For a generic data mining environment the list of the very basic operations could be as follows:
The methods and tools presented in this work were used to analyze the pulp and paper industry worldwide and the Scandinavian industry in more detail. The hierarchical SOM was used to combine data from different areas. Such use of multiple interpretation layers introduces some additional error to the process but on the other hand provides a more structured solution to data fusion than simple concatenation of feature vectors.
The results were encouraging. However, much work is still needed regarding the postprocessing stage and the interpretation of results. The analysis in the work was performed by hand and was both time-consuming and inaccurate as it was based on visual inspection rather than exact measures from the SOM. The development and automated usage of algorithms that cluster the units of the SOM will be an essential part of future work. Such clustering should not be based only on the distance matrix of the SOM, but also on the rate of change in the values of individual component planes. This could be accomplished by the use of the hierarchical maps or with fuzzy interpretation rules.
All in all, the many abilities of the SOM together with its robustness and flexibility are a combination which makes the SOM a prime tool in knowledge discovery and data mining.