Publications full details

Journal articles, book chapters

The Self-Organizing Map as a Tool in Knowledge Engineering

Authors: Johan Himberg, Jussi Ahola, Esa Alhoniemi, Juha Vesanto and Olli Simula
Type: Chapter in Pattern Recognition in Soft Computing Paradigm
Description: SOM clustering, 28 pages, 2001
Availability: [PS] (8 MB), [zipped:PS] (869 kB)
Notes:
  • The PostScript file above differs in format but in content from the published version.
  • Some figures are in color.
Abstract:
The Self-Organizing Map (SOM) is one of the most popular neural network methods. It is a powerful tool in visualization and analysis of high-dimensional data in various engineering applications. The SOM maps the data on a two-dimensional grid which may be used as a base for various kinds of visual approaches for clustering, correlation and novelty detection. In this chapter, we present novel methods that enhance the SOM based visualization in correlation hunting and novelty detection. These methods are applied to two industrial case studies: analysis of hot rolling of steel and continuous pulp process. A research software for fast development of SOM based tools is briefly described.
Bibtex:
@InBook{himberg2001prscp,
  editor = 	 {Nikhil R.~Pal},
  title = 	 {Pattern Recognition in Soft Computing Paradigm},
  chapter = 	 {The Self-Organizing Map as a Tool in Knowledge Engineering},
  publisher = 	 {World Scientific Publishing},
  year = 	 2001,
  series =	 {Soft Computing},
  pages =	 {38--65}
}

Clustering of the Self-Organizing Map

Authors: Juha Vesanto and Esa Alhoniemi
Type: journal article, in Volume 11(3) of IEEE Transactions on Neural Networks, special issue on data mining
Description: SOM clustering, 12 pages, 2000
Availability: [PS] (1 MB), [zipped:PS] (304 kB)
Notes:
  • The PostScript file above is the final submitted version of the paper (prior to final proofreading and typesetting by IEEE).
  • The proof of Var{mu} = Var{x}/N in section III.C. goes like this: Var{mu} = Var{sum{x}/N} = Var{sum{x}} / N^2 = N Var{x} / N^2 = Var{x} / N
  • Unfortunately we missed the work of Marie Cottrell et al. on hierarchical clustering using SOM while preparing the paper. See for example: "Analyzing and representing multidimensional quantitative and qualitative data : Demographic study of the Rhône valley. The domestic consumption of the Canadian families", (avec P.Gaubert, P.Letremy, P.Rousset), in Kohonen Maps, E.Oja and S.Kaski Eds., Elsevier, Chap. 1, p.1-14, 1999.
  • Here's the clown data set used in the paper. The zipped file includes the Matlab code used to generate the data, and the data used in the paper both as ASCII flat file and as SOM Toolbox vs1 data struct (use 'load' command in Matlab to read it in).
Abstract:
The Self-Organizing Map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid which can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using k-means are investigated. The two-stage procedure --- first using SOM to produce the prototypes which are then clustered in the second stage --- is found to perform well when compared to direct clustering of the data and to reduce the computation time.
Bibtex:
@Article{vesanto2000tnn,
    author =       {Juha Vesanto and Esa Alhoniemi},
    title =        {Clustering of the Self-Organizing Map}, 
    journal =      {IEEE Transactions on Neural Networks}, 
    publisher =    {IEEE},
    year =         {2000},
    volume =       {11},
    number =       {3},
    month =        {May},
    pages =        {586--600},
    note =         {},
    annote =       {}
}

SOM-Based Data Visualization Methods

Authors: Juha Vesanto
Type: journal article, in Volume 3(2) of IDA
Description: SOM visualization, 21 pages, 1999, errata published at November 22nd 1999
Availability: [PS] (4 MB), [zipped:PS] (570 kB), [Errata:PS,PDF,SDW,DOC] (39 kB)
Notes:
Errata: In the printed and electronic versions of IDA and in versions obtained from here prior November 19th 1999, the citations in Tables 1 and 2 are almost all wrong and some references are missing from the bibliography. Please, download the errata file to obtain corrected Tables and the omitted references. The errata file contains errata sheets in PostScript, PDF, MS-Word 95 and StarWriter 5.0. In the version available here the references have been corrected.
Abstract:
The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired from different presentations and how the SOM can best be utilized in exploratory data visualization. Most of the presented methods can also be applied in the more general case of first making a vector quantization (e.g. k-means) and then a vector projection (e.g. Sammon's mapping).
Bibtex:
@Article{vesanto99ida,
    author =       {Juha Vesanto},
    title =        {SOM-Based Data Visualization Methods},
    journal =      {Intelligent Data Analysis}, 
    publisher =    {Elsevier Science},
    year =         {1999},
    volume =       {3},
    number =       {2},
    month =        {},
    pages =        {111--126},
    note =         {},
    annote =       {}
}

Self-Organizing Map for Data Mining in MATLAB: the SOM Toolbox

Authors: Juha Vesanto, Esa Alhoniemi, Johan Himberg, Kimmo Kiviluoto and Jukka Parviainen
Type: presentation in SNE journal
Description: SOM Toolbox 1.0, 1 page, 1999
Availability: [DOC] (3 MB), [zipped:DOC] (90 kB)
Notes:
SOM Toolbox website
Abstract:
The SOM Toolbox is a free function library for MATLAB 5 implementing the Self-Organizing Map (SOM) algorithm.
Bibtex:
@Article{vesanto99sne,
    author =       {Juha Vesanto and Esa Alhoniemi and Johan Himberg and 
                    Kimmo Kiviluoto and Jukka Parviainen},
    title =        {Self-Organizing Map for Data Mining in MATLAB: 
                    the SOM Toolbox},
    journal =      {Simulation News Europe}, 
    publisher =    {ARGE Simulation News},
    year =         {1999},
    volume =       {},
    number =       {25},
    month =        {March},
    pages =        {54},
    note =         {},
    annote =       {}
}

Analysis and Modeling of Complex Systems Using the Self-Organizing Map

Authors: Olli Simula, Juha Vesanto, Esa Alhoniemi and Jaakko Hollmén
Type: Chapter in Neuro-Fuzzy Techniques for Intelligent Information Systems
Description: process analysis, 16 pages, 1999
Availability: [zipped:PS] (188 kB), [PS] (1 MB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network for analysis and visualization of high-dimensional data. It maps nonlinear statistical relationships between high-dimensional input data into simple geometric relationships on a usually two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for efficient data visualization and clustering is often faced in various engineering problems. In this chapter, SOM based methods are applied in analysis, monitoring and modeling of complex systems.
Bibtex:
@InBook{simula99nftt,
    author =       {Olli Simula and Juha Vesanto and Esa Alhoniemi and
                    Jaakko {Hollm\'en}},
    title =        {Neuro-Fuzzy Techniques for Intelligent Information Systems},
    chapter =      {Analysis and Modeling of Complex Systems Using the 
                    Self-Organizing Map},
    publisher =    {Physica Verlag (Springer Verlag)},
    editor =       {N.~Kasabov and R.~Kozma},
    year =         {1999},
    pages =        {3--22}, 
    note =         {},
    isbn =         {3-7908-1187-4}, 
    annote =       {}
}

Process Monitoring and Modeling using the Self-Organizing Map

Authors: Esa Alhoniemi, Jaakko Hollmén, Olli Simula and Juha Vesanto
Type: journal article in Integrated Computer Aided Engineering
Description: process analysis, 17 pages, 1999
Availability: [PS] (1 MB), [zipped:PS] (187 kB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network method for analysis and visualization of high-dimensional data. It maps nonlinear statistical dependencies between high-dimensional measurement data into simple geometric relationships on a usually two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for visualization and clustering occur, for instance, in the analysis of various engineering problems. In this paper, the SOM has been applied in monitoring and modeling of complex industrial processes. Case studies, including pulp process, steel production, and paper industry are described.
Bibtex:
@Article{alhoniemi98icae,
    author =       {Esa Alhoniemi and Jaakko {Hollm\'en} and Olli Simula
                    and Juha Vesanto},
    title =        {Process Monitoring and Modeling using the
                    Self-Organizing Map},
    journal =      {Integrated Computer Aided Engineering},
    publisher =    {John Wiley \& Sons},
    year =         {1999},
    volume =       {6},
    number =       {1},
    month =        {},
    pages =        {3--14},
    note =         {},
    annote =       {}
}

The Self-Organizing Map in Industry Analysis

Authors: Olli Simula, Petri Vasara, Juha Vesanto and Riina-Riitta Helminen
Type: Chapter 4 in "Industrial Applications of Neural Networks"
Description: industry analysis using SOM, 27 pages, 1999
Availability: [zipped:DOC] (359 kB), [zipped:PS] (404 kB), [PS] (2 MB), [DOC] (2 MB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network method for the analysis and visualization of high-dimensional data. It maps nonlinear statistical relationships between high-dimensional measurement data into simple geometric relationships, usually on a two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for visualization and clustering occurs, for instance, in the data analysis of complex processes or systems. In various engineering applications, entire fields of industry can be investigated using SOM based methods. The data exploration tool presented in this chapter allows visualization and analysis of large data bases of industrial systems. Forest industry is the first chosen application for the tool. To illustrate the global nature of forest indsutry, the example case is used to cluster the pulp and paper mills of the world.
Bibtex:
@InBook{simula99iann,
    author =       {Olli Simula and Petri Vasara and Juha Vesanto 
                    and Riina-Riitta Helminen},
    chapter =      {The Self-Organizing Map in Industry Analysis},
    title =        {Industrial Applications of Neural Networks},
    editor =       {L.C.~Jain and V.R.~Vemuri},
    publisher =    {CRC Press},
    year =         {1999},
    pages =        {87--112},
    annote =       {}
}

Conference articles

An Automated Report Generation Tool for the Data Understanding Phase

Authors: Juha Vesanto and Jaakko Hollmen
Type: Conference article in HIS'01
Description: Automated exploratory data analysis, 15 pages, 2001
Availability: [PS] (782 kB), [zipped:PS] (143 kB)
Abstract:
To prepare and model data successfully, the data miner needs to be aware of the properties of the data manifold. In this paper, the outline of a tool for automatically generating data survey reports for this purpose is described. The report combines linguistic descriptions (rules) and statistical measures with visualizations. Together these provide both quantitative and qualitative information and help the user to form a mental model of the data. The main focus is on describing the cluster structure and the contents of the clusters. The data is clustered using a novel algorithm based on the Self-Organizing Map. The rules describing the clusters are selected using a significance measure based on the confidence on their characterizing and discriminating properties.
Bibtex:
@InProceedings{vesanto2001his,
  author = 	 {Juha Vesanto and Jaakko Hollm{\'e}n},
  title = 	 {An Automated Report Generation Tool for the Data Understanding Phase},
  booktitle = 	 {Hybrid Intelligent Systems},
  publisher =	 {Physica Verlag}, 
  year =	 2002,
  editor =	 {A.~Abraham and M.~Koeppen},
  series =	 {Advances in Soft Computing},
  address =	 {Heidelberg},
  note =	 {In print.}
}

An Approach to Automated Interpretation of SOM

Authors: Markus Siponen, Juha Vesanto, Olli Simula, Petri Vasara
Type: Conference article in WSOM2001
Description: Automated interpretation of SOM using rules, 6 pages, 2001
Availability: [PS] (1 MB), [zipped:PS] (428 kB)
Abstract:
The objective of this work was to develop automatic tools for post-processing of SOMs, especially in the context of hierarchical data --- data where each higher level object consists of a varying number of lower level objects. Both low and high level data is available and needs to be utilized. The information from lower levels is transferred to higher level using data histograms of lower level clusters. The clusters are formed and interpreted automatically so as to summarize the information given by the SOM, and to produce meaningful indicators that are useful also to problem domain experts. The results show that the approach works well at least in the case study of pulp and paper mills technology data.
Bibtex:
@InProceedings{siponen2001wsom,
  author = 	 {Markus Siponen, Juha Vesanto, Olli Simula, Petri Vasara},
  title = 	 {An Approach to Automated Interpretation of SOM},
  booktitle = 	 {Proceedings of Workshop on Self-Organizing Map 2001},
  pages =	 {89--94},
  year =	 2001,
  editor =	 {Nigel Allinson, Hujun Yin, Lesley Allinson, Jon Slack},
  month =	 {June},
  publisher =	 {Springer}
}

Importance of Individual Variables in the k-Means Algorithm

Authors: Juha Vesanto
Type: Conference article in PAKDD2001
Description: Effect of scaling in k-means, 6 pages, 2001
Availability: [PS] (210 kB), [zipped:PS] (59 kB)
Abstract:
In this paper, quantization errors of individual variables in k-means quantization algorithm are investigated with respect to scaling factors, variable dependency, and distribution characteristics. It is observed that Z-norm standardation limits average quantization errors per variable to unit range. Two measures, quantization quality and effective number of quantization points are proposed for evaluating the goodness of quantization of individual variables. Both measures are invariant with respect to scaling/variances of variables. By comparing these measures between variables, a sense of the relative importance of variables is gained.
Bibtex:
@InProceedings{vesanto2001pakdd,
  author = 	 {Juha Vesanto},
  title = 	 {Importance of Individual Variables in the k-Means Algorithm},
  booktitle = 	 {Proceedings of the Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining (PAKDD2001)},
  pages =	 {513--518},
  year =	 2001,
  editor =	 {David Cheung, Graham J.~Williams, Qing Li},
  month =	 {April},
  publisher =	 {Springer}
}

A SOM Based Cluster Visualization and Its Application for False Coloring

Authors: Johan Himberg
Type: Conference article in IJCNN2000
Description: Clustering visualization / false coloring of SOM, 6 pages, 2000
Availability: [zipped:PS] (159 kB), [PS] (6 MB)
Abstract:
The self-organizing map (SOM) is widely used as a data visualization method in various engineering applications. It performs a non-linear mapping from a high-dimensional data space to a lower dimensional visualization space. In this paper, a simple method for visualizing the cluster structure of SOM model vectors is presented. The method may be used to produce tree-like visualizations, but the main application here is to get different color codings that express the approximate cluster structure of the SOM model vectors. This coloring may be exploited in making false color (pseudo color) presentations of the original data. The method is especially meant for making an easily implementable, explorative cluster visualization tool.
Bibtex:
@InProceedings{jhimberg2000ijcnn,
  author = 	 {Johan Himberg},
  title = 	 {A SOM Based Cluster Visualization and Its Application for False Coloring},
  booktitle =    {Proceedings of International Joint Conference on 
                  Neural Networks (IJCNN2000)},
  pages =	 {587--592},
  volume =       {3}, 
  year =	 {2000}
}

Neural Network Tool for Data Mining: SOM Toolbox

Authors: Juha Vesanto
Type: Conference article in TOOLMET2000
Description: Computational complexity of SOM Toolbox, 13 pages, 2000
Availability: [zipped:PS] (189 kB), [PS(corrected)] (274 kB), [PS(original)] (256 kB)
Notes:
SOM Toolbox website.
Abstract:
Self-Organizing Map is an unsupervised neural network which combines vector quantization and vector projection. This makes it a powerful visualization tool. SOM Toolbox implements the SOM in the Matlab 5 computing environment. In this paper, computational complexity of SOM and the applicability of the Toolbox are investigated. It is seen that the Toolbox is easily applicable to small data sets (under 10000 records) but can also be applied in case of medium sized data sets. The prime limiting factor is map size: the Toolbox is mainly suitable for training maps with 1000 map units or less.
Bibtex:
@InProceedings{vesanto2000toolmet,
  author = 	 {Juha Vesanto},
  title = 	 {Neural Network Tool for Data Mining: SOM Toolbox},
  booktitle =    {Proceedings of Symposium on Tool Environments 
                  and Development Methods for Intelligent Systems (TOOLMET2000)},
  pages =	 {184--196},
  year =	 {2000},
  publisher =    {Oulun yliopistopaino},
  address =      {Oulu, Finland}
}

SOM Based Analysis of Pulping Process Data

Authors: Olli Simula and Esa Alhoniemi
Type: Conference article in IWANN'99
Description: process analysis, 1999
Availability: [zipped:PS] (375 kB), [PS] (2 MB)
Abstract:
Data driven analysis of complex systems or processes is necessary in many practical applications where analytical modeling is not possible. The Self-Organizing Map (SOM) is a neural network algorithm that has been widely applied in analysis and visualization of high-dimensional data. It carries out a nonlinear mapping of input data onto a two-dimensional grid. The mapping preserves the most important topological and metric relationships of the data. The SOM has turned out to be an efficient tool in data exploration tasks in various engineering applications: process analysis in forest industry, steel production and analysis of telecommunication networks and systems. In this paper, SOM based analysis of complex process data is discussed. As a case study, analysis of a continuous pulp digester is presented. The SOM is used to form visual presentations of the data. By interpreting the visualizations, complex parameter dependencies can be revealed. By concentrating on the significant measurements, reasons for digester faults can be determined.
Bibtex:
@InProceedings{simula99iwann,
  author = 	 {Olli Simula and Esa Alhoniemi},
  title = 	 {{SOM Based Analysis of Pulping Process Data}},
  booktitle =    {Proceedings of International Work-Conference 
                  on Artificial and Natural Neural Networks (IWANN '99)},
  pages =	 {567--577},
  year =	 {1999},
  volume =	 {II},
  publisher =	 {Springer},
  annote = 	 {}
}

Self-Organizing Map in Matlab: the SOM Toolbox

Authors: Juha Vesanto, Johan Himberg, Esa Alhoniemi and Juha Parhankangas
Type: Conference article in MATLAB-DSP 1999
Description: SOM Toolbox 2.0, 6 pages, 1999
Availability: [zipped:DOC] (89 kB), [DOC] (133 kB)
Notes:
SOM Toolbox website
Abstract:
The self-organizing map (SOM) is a vector quantization method which places the prototype vectors on a regular low-dimensional grid in an ordered fashion. This makes the SOM a powerful visualization tool. The SOM Toolbox is an implementation of the SOM and its visualization in the Matlab 5 computing environment. In this article, the SOM Toolbox and its usage are shortly presented. Also its performance in terms of computational load is evaluated and compared to a corresponding C-program.
Bibtex:
@InProceedings{vesanto99matlab,
  author =       {Juha Vesanto and Johan Himberg and Esa Alhoniemi and Juha Parhankangas},
  title =        {Self-Organizing Map in Matlab: the SOM Toolbox},
  booktitle =    {Proceedings of the Matlab DSP Conference 1999}, 
  address =      {Espoo, Finland},
  year =         {1999},
  month =        {November},
  pages =        {35-40},
  annote =       {}
}

Probabilistic Measures for Responses of Self-Organizing Map Units

Authors: Esa Alhoniemi, Johan Himberg and Juha Vesanto
Type: Conference article in CIMA'99
Description: pdf-estimation using SOM, 1999
Availability: [PS] (2 MB), [zipped:PS] (602 kB)
Abstract:
The self-organizing map (SOM) is a widely used data visualization tool in engineering applications. The algorithm performs a non-linear mapping from a high-dimensional data space to a low-dimensional space, which is typically a two-dimensional, rectangular grid. This makes it possible to present multidimensional data in two dimensions. Often the model vectors of the SOM and a new data sample need to be compared. The SOM, however, gives no probability measures to determine, if the sample belongs to data sets determined by map units. For this purpose a modified batch version of reduced kernel density estimator (RKDE) was tested. The results were compared with Gaussian Mixture Model (GMM) and S-Map.
Bibtex:
@InProceedings{alhoniemi99cima,
  author = 	 {Esa Alhoniemi and Johan Himberg and Juha Vesanto},
  title = 	 {{Probabilistic Measures for Responses of
                  Self-Organizing Map Units}},
  pages =	 {286--290},
  year =	 {1999},
  booktitle = 	 {Proceeding of the International ICSC Congress on Computational 
                  Intelligence Methods and Applications (CIMA '99)},
  editor =	 {H. Bothe and E. Oja and E. Massad and C. Haefke},
  publisher =	 {ICSC Academic Press},
  annote =	 {}
}

Hunting for Correlations in Data Using the Self-Organizing Map

Authors: Juha Vesanto and Jussi Ahola
Type: Conference article in CIMA'99
Description: correlation hunting, 1999
Availability: [PS] (330 kB)
Abstract:
The Self-Organizing Map (SOM) is an efficient tool for visualization of multidimensional numerical data. One of the tasks it is used for is correlation hunting. In this paper we present a simple method to enhance correlation hunting in the case of a large number of variables. Different variations of the method - component plane reorganization - are evaluated on a complex test data. The purpose is to somewhat validate the use of SOM in correlation hunting and to evaluate the strengths and weaknesses of different reorganization procedures. A case with a real world data is also presented to show the usefulness of the method.
Bibtex:
@InProceedings{vesanto99cima,
  author = 	 {Juha Vesanto and Jussi Ahola},
  title = 	 {{Hunting for Correlations in Data Using the
                  Self-Organizing Map}},
  pages =	 {279--285},
  booktitle = 	 {Proceeding of the International ICSC Congress on Computational 
                  Intelligence Methods and Applications (CIMA '99)},
  year = 	 {1999},
  editor =	 {H. Bothe and E. Oja and E. Massad and C. Haefke},
  publisher =	 {ICSC Academic Press},
  annote =	 {}
}

Enhancing the SOM based data visualization by linking different data projections

Authors: Johan Himberg
Type: Conference article in IDEAL'98
Description: SOM visualization, 8 pages, 1998
Availability: [PS] (298 kB), [zipped:PS] (78 kB)
Abstract:
The self-organizing map (SOM) is widely used as a data visualization method especially in various engineering applications. It performs a non-linear mapping from a high-dimensional data space to a lower dimensional visualization space. The SOM can be used for example in correlation detection and cluster visualization in explorative manner. In this paper two tools for refing the SOM-based visualization are presented. The first one brings out a sharper view to the correlation detection and the second one brings additional information to the input space distance visualization. Both tools are based on linking two different data projections using color coding. The tools are demonstrated using a real world data example from a queuing system.
Bibtex:
@InProceedings{himberg98ideal,
  author =       {Johan Himberg},
  title =        {Enhancing the SOM based data visualization by 
                  linking different data projections},
  booktitle =    {Proceedings of the International Symposium on 
                  Intelligent Data Engineering and Learning 
                  (IDEAL'98)}, 
  address =      {Hong Kong},
  year =         {1998},
  month =        {October},
  pages =        {427--434},
  annote =       {}
}

Enhancing SOM based data visualization

Authors: Juha Vesanto, Johan Himberg, Markus Siponen and Olli Simula
Type: Conference article in IIZUKA'98
Description: SOM visualization, 4 pages, 1998
Availability: [HTML] (4 kB), [PS] (728 kB), [zipped:PS] (152 kB)
Abstract:
The Self-Organizing Map (SOM) is an effective data exploration tool. One of the reasons for this is that it is conceptually very simple and its visualization is easy. In this paper, we propose new ways to enhance the visualization capabilities of the SOM in three areas: clustering, correlation hunting, and novelty detection. These enhancements are illustrated by various examples using real-world data.
Bibtex:
@InProceedings{vesanto98iizuka,
    author =       {Juha Vesanto and Johan Himberg and Markus Siponen
                    and Olli Simula},
    title =        {Enhancing SOM based data visualization},
    booktitle =    {Proceedings of the International Conference on 
                    Soft Computing and Information/Intelligent Systems
                    (IIZUKA'98)},
    address  =     {Iizuka, Japan}, 
    month =        {October},
    year =         {1998},
    pages =        {64--67},
    annote =       {}
}

Analysis of Industrial Systems Using the Self-Organizing Map

Authors: Olli Simula, Juha Vesanto and Petri Vasara
Type: Conference article in KES'98
Description: industry analysis, 8 pages, 1998
Availability: [zipped:DOC] (102 kB), [DOC] (672 kB)
Abstract:
The Self-Organizing Map (SOM) is a neural network algorithm which is especially suitable for the analysis and visualization of high-dimensional data. It maps nonlinear statistical relationships between high-dimensional input data into simple geometric relationships, usually on a two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for visualization and clustering occurs in various engineering applications, in the analysis of complex processes or systems. In addition, SOM allows easy data fusion enabling visualization and analysis of large data bases of industrial systems. As a case study, the SOM has been used to cluster the pulp and paper mills of the world.
Bibtex:
@InProceedings{simula98kes,
    author =       {Olli Simula and Juha Vesanto and Petri Vasara},
    title =        {Analysis of Industrial Systems Using the
                    Self-Organizing Map},
    booktitle =    {Proceedings of the Internationa Conference on 
                    Knowledge-based Intelligent Systems (KES'98)},
    address =      {Adelaide, Australia},
    year =         {1998},
    month =        {April},
    volume =       {1},
    pages =        {61--68},
    annote =       {}
}

Integrating environmental, technologigal and financial data in forest industry analysis

Authors: Juha Vesanto, Petri Vasara, Riina-Riitta Helminen and Olli Simula
Type: Conference article in SNN'97
Description: industry analysis, 4 pages
Availability: [PS] (5 MB), [zipped:PS] (171 kB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network method for the analysis and visualisation of high-dimensional data. In this paper, the SOM algorithm is applied to the analysis of the technology of world paper and pulp industry. It is seen that the method can be used on environmental, technological and financial data to produce a comprehensive view of the industry as a whole.
Bibtex:
@InProceedings{vesanto97snn,
    author =       {Juha Vesanto and Petri Vasara and Riina-Riitta
                    Helminen and Olli Simula},
    title =        {Integrating environmental, technologigal and
                    financial data in forest industry analysis},
    booktitle =    {Proceedings of Stichting Neurale Netwerken 
                    Conference (SNN'97)}, 
    address =      {Amsterdam, Netherlands},
    year =         {1997},
    month =        {May},
    pages =        {153--156},
    annote =       {}
}

Analysis of Complex Systems Using the Self-Organizing Map

Authors: Olli Simula, Esa Alhoniemi, Jaakko Hollmén and Juha Vesanto
Type: Conference article in ICONIP'97
Description: process analysis, 5 pages, 1997
Availability: [PS] (198 kB), [zipped:PS] (66 kB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network method for the analysis and visualization of high-dimensional data. It maps nonlinear statistical relationships between high-dimensional input data into simple geometric relationships on a usually two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for efficient data visualization and clustering is often faced, for instance, in the analysis of various engineering problems. In this paper, the use of the SOM based methods in analysis, monitoring and modeling of complex industrial processes is discussed.
Bibtex:
@InProceedings{simula97iconip,
    author =       {Olli Simula and Esa Alhoniemi and Jaakko {Hollm\'en} 
                    and Juha Vesanto},
    title =        {Analysis of Complex Systems Using the 
                    Self-Organizing Map},
    booktitle =    {Proceedings of the International Conference on 
                    Neural Information Processing and Intelligent
                    Information Systems (ICONIP'97)},
    year =         {1997},
    pages =        {1313--1317},
    annote =       {}
}

Using the SOM and Local Models in Time-Series Prediction

Authors: Juha Vesanto
Type: Conference article in WSOM'97
Description: time-series prediction, 6 pages, 1997
Availability: [PS] (172 kB), [zipped:PS] (65 kB)
Abstract:
In this paper we test the Self-Organizing Map (SOM) on the problem of predicting chaotic time-series (specifically Mackey-Glass series) with local linear models defined separately for each of the prototype vectors of the SOM. We see that the method achieves good results. This together with the capabilities of the SOM make it a valuable tool in exploratory data mining.
Bibtex:
@InProceedings{vesanto97wsom,
    author =       {Juha Vesanto},
    title =        {Using the SOM and Local Models in Time-Series Prediction},
    booktitle =    {Proceedings of Workshop on Self-Organizing
                    Maps (WSOM'97)},
    address =      {Espoo, Finland},
    year =         {1997},
    month =        {June},
    pages =        {209--214},
    annote =       {}
}

Analyzing an Automatic Call Distribution System using the Self-Organizing Map

Authors: Johan Himberg and Olli Simula
Type: Conference article in FINSIG'97
Description: phone service data analysis, 1997
Availability: not available
Bibtex:
@InProceedings{Himberg97,
  author = 	 {Johan Himberg and Olli Simula},
  title = 	 {Analyzing an Automatic Call Distribution System
		  using the Self-Organizing Map},
  pages =	 {153--157},
  booktitle =	 {Proceedings of 1997 Finnish Signal Processing Symposium
		  (FINSIG'97)}, 
  address =      {Pori, Finland},
  year =	 {1997},
  month =        {May},
  annote =       {}
}

Monitoring and modeling of complex processes using hierarchical self-organizing maps

Authors: Olli Simula, Esa Alhoniemi, Jaakko Hollmén and Juha Vesanto
Type: Conference article in ISCAS'96
Description: process analysis, 4 pages, 1996
Availability: [PS] (1 MB), [zipped:PS] (86 kB)
Abstract:
In this paper, a neural network based analysis method for monitoring and modeling the dynamic behavior of complex industrial processes is considered. The method is based on the unsupervised learning property of the Self-Organizing Map (SOM) algorithm. The time series produced by several sensors measuring the process parameters as well as other process data are used in mapping the process behavior and dynamics into the network.
Bibtex:
InProceedings{Simula96,
    author =       {Olli Simula and Esa Alhoniemi and Jaakko {Hollm\'en}
                    and Juha Vesanto},
    title =        {Monitoring and modeling of complex processes using
                    hierarchical self-organizing maps},
    booktitle =    {Proceedings of the {IEEE} International Symposium
                    on Circuits and Systems (ISCAS'96)},
    volume =       {Supplement},
    year =         {1996},
    month =        {May},
    pages =        {73--76},
    annote =       {}
}

Prediction Models and Sensitivity Analysis of Industrial Production Process Parameters by Using the Self-Organizing Map

Authors: Jaakko Hollmén and Olli Simula
Type: Conference article in NORSIG'96
Description: process analsis, 4 pages, 1996
Availability: not available
Bibtex:
@InProceedings{Hollmen96,
  author = 	 {Jaakko {Hollm\'en} and Olli Simula},
  title = 	 {Prediction Models and Sensitivity Analysis of
		  Industrial Production Process Parameters by Using
		  the Self-Organizing Map},
  booktitle = 	 {Proceedings of {IEEE} Nordic Signal Processing Symposium
                  (NORSIG'96)},
  year = 	 {1996},
  pages = 	 {79--82},
  annote =       {}
}

Presentations, other...

The SOM in data mining: analysis of world pulp and paper technology

Authors: Juha Vesanto
Type: Presentation in a Workshop in SCIA'97
Description: industry analysis, SOM software, 10 pages, 1997
Availability: [PS] (2 MB), [zipped:PS] (123 kB)
Abstract:
The Self-Organizing Map (SOM) is a powerful neural network method for the analysis and visualisation of high-dimensional data. In the Entire project, a data mining tool using the SOM was implemented and used to analyse world pulp and paper technology.
Bibtex:
@Unpublished{Vesanto97,
    author = 	 {Juha Vesanto},
    title = 	 {The SOM in data mining: analysis of world pulp 
                    and paper technology},
    note = 	 {Presentation in SCIA'97.},
    year = 	 {1997},
    annote =     {}
}

Thesis

Using SOM in Data Mining

Authors: Juha Vesanto
Type: Licentiate's thesis
Description: An overview of data mining process and using SOM in it, 57 pages, 2000
Availability: [zipped:PS,PDF] (902 kB), [PS] (7 MB), [PDF] (4 MB)
Notes:
The thesis consists of the introduction (given here) and three publications:
  1. Probabilistic Measures for Responses of Self-Organizing Map Units in Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA '99)
  2. SOM-Based Data Visualization Methods in Intelligent Data Analysis
  3. Clustering of the Self-Organizing Map in Transactions on Neural Networks (to be published)
Abstract:
Data mining as a research area answers to the challenge of analysing large databases in commerce, industry, and research. The purpose is to find new knowledge from databases where the dimensionality, complexity, or amount of data is prohibitively large for manual analysis. Data mining is an interactive process requiring that the intuition and background knowledge of application experts are coupled with the computational efficiency of modern computer technology.

The Self-Organizing Map (SOM) is one of the most popular neural network models. The SOM quantizes the data space formed by the training data and simultaniously performs a topology-preserving projection of the data onto a regular low-dimensional grid. The grid can be used efficiently in visualization.

This thesis consists of an introduction and three publications. In the introduction, an overview of each step of the data mining process is first presented, primarily based on the CRoss-Industry Standard Process model for Data Mining (CRISP-DM). Then the SOM algorithm and some of its variants are introduced, and the use of SOM in data mining is discussed. The publications deal with modeling, visualization and clustering of data using the SOM. In addition, the introduction discusses the use of SOM in summarization.

The SOM is especially suitable for data understanding, but it is a robust tool suitable for modeling and preparation of data as well. It offers a convenient workbench which helps in gaining an initial understanding of the data at hand, and it can be used for creating some initial models as well.

Keywords: Self-Organizing Map, data mining, knowledge discovery in databases, visualization, clustering, summarization, data survey

Bibtex:
@Booklet{vesanto2000licentiate,
  title = 	 {Using SOM in Data Mining},
  author = 	 {Juha Vesanto},
  howpublished = {Licentiate's thesis in the Helsinki University of Technology},
  month = 	 {April},
  year = 	 {2000},
  annote =       {}
}

Prosessin mittauksiin perustuva sulfaattisellun jatkuvatoimisen keiton analyysi

Authors: Esa Alhoniemi
Type: Licentiate's thesis
Description: analysis of kraft pulping process, 88 pages, 1998
Availability: not available
Notes:
In finnish.
Bibtex:
@Booklet{alhoniemi98licentiate,
  title = 	 {Prosessin mittauksiin perustuva sulfaattisellun
		  keiton analyysi},
  author = 	 {Esa Alhoniemi},
  howpublished = {Licentiate's thesis in the Helsinki University of Technology},
  month = 	 {August},
  year = 	 {1998}
  annote =       {}
}

Data Mining for Finding Surface Defects in Steel Strips

Authors: Jukka Parviainen
Type: Master's thesis
Description: analysis of defects in steel strips, 59 pages, 2000
Availability: [zipped:PS] (1 MB)
Abstract:
Data mining is a collection of methods which build models that depict behavior of a system. Data driven methods have been developed lately when the data processing and storing has become cheap in large scale. Data mining is user centered, data driven, interactive and iterative process whose stages are specification, data preparation, data survey, modeling and deployment.

A hot rolled strip is a steel product. Measurements in the rolling mill, product ion line are recorded into databases. There are sometimes surface defects on the strip surface. These originate from casting, rolling, failed descaling or mechanical touch.

The connection between process parameters and surface defects was examined in the work. The aim was to find a model for controlling optimal set-up values in order to avoid surface defects, and calculate a warning of a possible defect. Self-organizing map (SOM) was used as a data mining tool.

The work is a result of NEUROLL project and carried out in the Laboratory of Computer and Information Science in Helsinki University of Technology. The project partner was Rautaruukki Steel in Raahe.

Keywords: data mining, hot rolled strip, surface quality, self-organizing map

Bibtex:
@MastersThesis{parviainen00master,
  author = 	 {Jukka K Parviainen},
  title = 	 {Data Mining for Finding Surface Defects in Steel Strips},
  school = 	 {Helsinki University of Technology},
  year = 	 {2000},
  month =        {September},
  annote =       {}
}

Itseorganisoiva kartta jatkuvatoimisen sinkityslinjan ohjauksessa

Authors: Henry Stenberg
Type: Master's thesis
Description: process analysis software, 53 pages, 1998
Availability: not available
Notes:
In finnish.
Bibtex:
@MastersThesis{stenberg98master,
  author = 	 {Henry Stenberg},
  title = 	 {Itseorganisoiva kartta jatkuvatoimisen sinkityslinjan 
                  ohjauksessa},
  school = 	 {Helsinki University of Technology},
  year = 	 {1998},
  month =        {April},
  annote =       {}
}

Itseorganisoituvaan karttaan perustuva työkalu ja sen soveltaminen puheludatan analyysiin

Authors: Johan Himberg
Type: Master's thesis
Description: phone service data analysis, SOM visualization, 71 pages, 1997
Availability: not available
Notes:
In finnish.
Bibtex:
@MastersThesis{himberg97master,
  author = 	 {Johan Himberg},
  title = 	 {Itseorganisoituvaan karttaan perustuva {ty\"okalu} 
                  ja sen soveltaminen puheludatan analyysiin},
  school = 	 {Helsinki University of Technology},
  year = 	 {1997},
  month =        {October},
  annote =       {}
}

Data Mining Techniques Based on the Self-Organizing Map

Authors: Juha Vesanto
Type: Master's thesis
Description: data mining, industry analysis, SOM software, 63 pages, 1997
Availability: [HTML] (4 kB), [zipped:PS] (1 MB)
Abstract:
Data mining is a part of a larger area of recent research in artificial intelligence and information management: knowledge discovery in databases (KDD). The purpose of KDD is to find new knowledge from databases in which the dimension, complexity or the amount of data has so far been prohibitively large for human observation alone. Data mining refers to the exploratory phase of knowledge discovery.

The Self-Organizing Map (SOM) is one of the most popular neural network models. The SOM quantizes the data space formed by the training data and simultaniously performs a topology-preserving projecting of the data space on a regular two-dimensional grid. The SOM also has excellent visualization capabilities including techniques to give an informative picture of the data space, and techniques to compare data vectors or whole data sets with each other. The SOM can also be used for clustering, classification and modeling. The versatile properties of the SOM make it a valuable tool in data mining and knowledge discovery.

As part of this work a SOM-based data mining tool was implemented. The methods and tools presented in the work were used to analyze the pulp and paper industry worldwide and the Scandinavian industry in more detail with encouraging results. The analysis of technological data resulted in 20 major types of pulp and paper mills. Regarding Scandinavian industry a hierarchical structure of SOMs was used to combine technological, environmental and economical data.

The work has been done in the Laboratory of Computer and Information Science at the Helsinki University of Technology as part of the corporate project Entire in the technology program "Adaptive and Intelligent Systems Applications". The project was financed by Jaakko Pöyry Consulting and the Technology Development center of Finland (TEKES).

Bibtex:
@MastersThesis{vesanto97master,
    author =       {Juha Vesanto},
    title =        {Data Mining Techniques Based on the
                    Self-Organizing Map},
    school =       {Helsinki University of Technology},
    year =         {1997},
    month =        {May},
    url =          {http://www.cis.hut.fi/projects/monitor/publications/html/mastersJV97/},
    annote =       {}
}

Process Modeling Using the Self-Organizing Map

Authors: Jaakko Hollmén
Type: Master's thesis
Description: process analysis, 50 pages, 1996
Availability: not available
Bibtex:
@MastersThesis{hollmen96master,
  author = 	 {Jaakko {Hollm\'en}},
  title = 	 {Monitoring of Complex Processes Using the 
                  Self-Organizing Map},
  school = 	 {Helsinki University of Technology},
  year = 	 {1996},
  month =        {February},
  annote =       {}
}

Monitoring of Complex Processes Using the Self-Organizing Map

Authors: Esa Alhoniemi
Type: Master's thesis
Description: process analysis, 50 pages, 1995
Availability: not available
Bibtex:
@MastersThesis{alhoniemi95master,
  author = 	 {Esa Alhoniemi},
  title = 	 {Monitoring of Complex Processes Using the
		  Self-Organizing Map},
  school = 	 {Helsinki University of Technology},
  year = 	 {1995},
  month =        {December},
  annote =       {}
}