[Home]
[Contact]
[Publications]
[SOM Toolbox]
[CIS]
[HUT]
Publications
Journal articles, book chapters
- The Self-Organizing Map as a Tool in Knowledge Engineering
-
Authors: Johan Himberg, Jussi Ahola, Esa Alhoniemi, Juha Vesanto and Olli Simula
Type: Chapter in Pattern Recognition in Soft Computing Paradigm
Description: SOM clustering, 28 pages, 2001
Availability: [PS] (8 MB), [zipped:PS] (869 kB)
- Notes:
-
- The PostScript file above differs in format but in content from the published version.
- Some figures are in color.
- Abstract:
-
The Self-Organizing Map (SOM) is one of the most popular neural
network methods. It is a powerful tool in visualization and analysis
of high-dimensional data in various engineering applications. The SOM
maps the data on a two-dimensional grid which may be used as a base
for various kinds of visual approaches for clustering, correlation
and novelty detection. In this chapter, we present novel methods that
enhance the SOM based visualization in correlation hunting and novelty
detection. These methods are applied to two industrial case studies:
analysis of hot rolling of steel and continuous pulp process. A research
software for fast development of SOM based tools is briefly described.
- Bibtex:
-
@InBook{himberg2001prscp,
editor = {Nikhil R.~Pal},
title = {Pattern Recognition in Soft Computing Paradigm},
chapter = {The Self-Organizing Map as a Tool in Knowledge Engineering},
publisher = {World Scientific Publishing},
year = 2001,
series = {Soft Computing},
pages = {38--65}
}
- Clustering of the Self-Organizing Map
-
Authors: Juha Vesanto and Esa Alhoniemi
Type: journal article, in Volume 11(3) of IEEE Transactions on Neural Networks, special issue on data mining
Description: SOM clustering, 12 pages, 2000
Availability: [PS] (1 MB), [zipped:PS] (304 kB)
- Notes:
-
- The PostScript file above is the final submitted version of the paper
(prior to final proofreading and typesetting by IEEE).
- The proof of Var{mu} = Var{x}/N in section III.C. goes like this:
Var{mu} = Var{sum{x}/N} = Var{sum{x}} / N^2 = N Var{x} / N^2 = Var{x} / N
- Unfortunately we missed the work of Marie Cottrell et al. on
hierarchical clustering using SOM while preparing the paper. See
for example: "Analyzing and representing multidimensional quantitative and
qualitative data : Demographic study of the Rhône valley. The domestic
consumption of the Canadian families", (avec P.Gaubert, P.Letremy,
P.Rousset), in Kohonen Maps, E.Oja and S.Kaski Eds., Elsevier, Chap. 1,
p.1-14, 1999.
- Here's the clown data set used in the paper.
The zipped file includes the Matlab code used to generate the data,
and the data used in the paper both as ASCII flat file and as SOM Toolbox vs1
data struct (use 'load' command in Matlab to read it in).
- Abstract:
-
The Self-Organizing Map (SOM) is an excellent tool in exploratory
phase of data mining. It projects input space on prototypes of a
low-dimensional regular grid which can be effectively utilized to
visualize and explore properties of the data. When the number of SOM
units is large, to facilitate quantitative analysis of the map and
the data, similar units need to be grouped, i.e., clustered. In this
paper, different approaches to clustering of the SOM are
considered. In particular, the use of hierarchical agglomerative
clustering and partitive clustering using k-means are
investigated. The two-stage procedure --- first using SOM to produce
the prototypes which are then clustered in the second stage --- is
found to perform well when compared to direct clustering of the data
and to reduce the computation time.
- Bibtex:
-
@Article{vesanto2000tnn,
author = {Juha Vesanto and Esa Alhoniemi},
title = {Clustering of the Self-Organizing Map},
journal = {IEEE Transactions on Neural Networks},
publisher = {IEEE},
year = {2000},
volume = {11},
number = {3},
month = {May},
pages = {586--600},
note = {},
annote = {}
}
- SOM-Based Data Visualization Methods
-
Authors: Juha Vesanto
Type: journal article, in Volume 3(2) of
IDA
Description: SOM visualization, 21 pages, 1999, errata published at November 22nd 1999
Availability: [PS] (4 MB), [zipped:PS] (570 kB), [Errata:PS,PDF,SDW,DOC] (39 kB)
- Notes:
- Errata: In the printed and electronic versions of IDA and
in versions obtained from here prior November 19th 1999, the citations
in Tables 1 and 2 are almost all wrong and some references are missing
from the bibliography. Please, download the errata file to obtain
corrected Tables and the omitted references. The errata file contains
errata sheets in PostScript, PDF, MS-Word 95 and StarWriter 5.0.
In the version available here the references have been corrected.
- Abstract:
-
The Self-Organizing Map (SOM) is an efficient tool for
visualization of multidimensional numerical data. In this paper, an
overview and categorization of both old and new methods for the
visualization of SOM is presented. The purpose is to give an idea
of what kind of information can be acquired from different
presentations and how the SOM can best be utilized in exploratory
data visualization. Most of the presented methods can also be applied in
the more general case of first making a vector quantization (e.g.
k-means) and then a vector projection (e.g. Sammon's mapping).
- Bibtex:
-
@Article{vesanto99ida,
author = {Juha Vesanto},
title = {SOM-Based Data Visualization Methods},
journal = {Intelligent Data Analysis},
publisher = {Elsevier Science},
year = {1999},
volume = {3},
number = {2},
month = {},
pages = {111--126},
note = {},
annote = {}
}
- Self-Organizing Map for Data Mining in MATLAB: the SOM Toolbox
-
Authors: Juha Vesanto, Esa Alhoniemi, Johan Himberg, Kimmo Kiviluoto and Jukka Parviainen
Type: presentation in SNE journal
Description: SOM Toolbox 1.0, 1 page, 1999
Availability: [DOC] (3 MB), [zipped:DOC] (90 kB)
- Notes:
- SOM Toolbox website
- Abstract:
-
The SOM Toolbox is a free function library for MATLAB 5 implementing
the Self-Organizing Map (SOM) algorithm.
- Bibtex:
-
@Article{vesanto99sne,
author = {Juha Vesanto and Esa Alhoniemi and Johan Himberg and
Kimmo Kiviluoto and Jukka Parviainen},
title = {Self-Organizing Map for Data Mining in MATLAB:
the SOM Toolbox},
journal = {Simulation News Europe},
publisher = {ARGE Simulation News},
year = {1999},
volume = {},
number = {25},
month = {March},
pages = {54},
note = {},
annote = {}
}
- Analysis and Modeling of Complex Systems Using the Self-Organizing Map
-
Authors: Olli Simula, Juha Vesanto, Esa Alhoniemi and Jaakko Hollmén
Type: Chapter in Neuro-Fuzzy Techniques for Intelligent Information Systems
Description: process analysis, 16 pages, 1999
Availability: [zipped:PS] (188 kB), [PS] (1 MB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network for
analysis and visualization of high-dimensional data. It maps
nonlinear statistical relationships between high-dimensional input
data into simple geometric relationships on a usually
two-dimensional grid. The mapping roughly preserves the most
important topological and metric relationships of the original data
elements and, thus, inherently clusters the data. The need for
efficient data visualization and clustering is often faced in
various engineering problems. In this chapter, SOM based methods
are applied in analysis, monitoring and modeling of complex systems.
- Bibtex:
-
@InBook{simula99nftt,
author = {Olli Simula and Juha Vesanto and Esa Alhoniemi and
Jaakko {Hollm\'en}},
title = {Neuro-Fuzzy Techniques for Intelligent Information Systems},
chapter = {Analysis and Modeling of Complex Systems Using the
Self-Organizing Map},
publisher = {Physica Verlag (Springer Verlag)},
editor = {N.~Kasabov and R.~Kozma},
year = {1999},
pages = {3--22},
note = {},
isbn = {3-7908-1187-4},
annote = {}
}
- Process Monitoring and Modeling using the Self-Organizing Map
-
Authors: Esa Alhoniemi, Jaakko Hollmén, Olli Simula and Juha Vesanto
Type: journal article in
Integrated
Computer Aided Engineering
Description: process analysis, 17 pages, 1999
Availability: [PS] (1 MB), [zipped:PS] (187 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network method
for analysis and visualization of high-dimensional data. It maps
nonlinear statistical dependencies between high-dimensional
measurement data into simple geometric relationships on a usually
two-dimensional grid. The mapping roughly preserves the most
important topological and metric relationships of the original data
elements and, thus, inherently clusters the data. The need for
visualization and clustering occur, for instance, in the analysis of
various engineering problems. In this paper, the SOM has been
applied in monitoring and modeling of complex industrial processes.
Case studies, including pulp process, steel production, and paper
industry are described.
- Bibtex:
-
@Article{alhoniemi98icae,
author = {Esa Alhoniemi and Jaakko {Hollm\'en} and Olli Simula
and Juha Vesanto},
title = {Process Monitoring and Modeling using the
Self-Organizing Map},
journal = {Integrated Computer Aided Engineering},
publisher = {John Wiley \& Sons},
year = {1999},
volume = {6},
number = {1},
month = {},
pages = {3--14},
note = {},
annote = {}
}
- The Self-Organizing Map in Industry Analysis
-
Authors: Olli Simula, Petri Vasara, Juha Vesanto and Riina-Riitta Helminen
Type: Chapter 4 in "Industrial Applications of Neural Networks"
Description: industry analysis using SOM, 27 pages, 1999
Availability: [zipped:DOC] (359 kB), [zipped:PS] (404 kB), [PS] (2 MB), [DOC] (2 MB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network method
for the analysis and visualization of high-dimensional data. It
maps nonlinear statistical relationships between high-dimensional
measurement data into simple geometric relationships, usually on a
two-dimensional grid. The mapping roughly preserves the most
important topological and metric relationships of the original data
elements and, thus, inherently clusters the data.
The need for visualization and clustering occurs, for
instance, in the data analysis of complex processes
or systems. In various engineering applications, entire
fields of industry can be investigated using SOM based methods.
The data exploration tool presented in this chapter
allows visualization and analysis of large data bases of industrial
systems. Forest industry
is the first chosen application for the tool. To illustrate
the global nature of forest indsutry, the example case is used to
cluster the pulp and paper mills of the world.
- Bibtex:
-
@InBook{simula99iann,
author = {Olli Simula and Petri Vasara and Juha Vesanto
and Riina-Riitta Helminen},
chapter = {The Self-Organizing Map in Industry Analysis},
title = {Industrial Applications of Neural Networks},
editor = {L.C.~Jain and V.R.~Vemuri},
publisher = {CRC Press},
year = {1999},
pages = {87--112},
annote = {}
}
Conference articles
- An Automated Report Generation Tool for the Data Understanding Phase
-
Authors: Juha Vesanto and Jaakko Hollmen
Type: Conference article in HIS'01
Description: Automated exploratory data analysis, 15 pages, 2001
Availability: [PS] (782 kB), [zipped:PS] (143 kB)
- Abstract:
-
To prepare and model data successfully, the data miner needs to be
aware of the properties of the data manifold. In this paper, the
outline of a tool for automatically generating data survey reports
for this purpose is described. The report combines linguistic
descriptions (rules) and statistical measures with visualizations.
Together these provide both quantitative and qualitative information
and help the user to form a mental model of the data. The main focus
is on describing the cluster structure and the contents of the
clusters. The data is clustered using a novel algorithm based on the
Self-Organizing Map. The rules describing the clusters are selected
using a significance measure based on the confidence on their
characterizing and discriminating properties.
- Bibtex:
-
@InProceedings{vesanto2001his,
author = {Juha Vesanto and Jaakko Hollm{\'e}n},
title = {An Automated Report Generation Tool for the Data Understanding Phase},
booktitle = {Hybrid Intelligent Systems},
publisher = {Physica Verlag},
year = 2002,
editor = {A.~Abraham and M.~Koeppen},
series = {Advances in Soft Computing},
address = {Heidelberg},
note = {In print.}
}
- An Approach to Automated Interpretation of SOM
-
Authors: Markus Siponen, Juha Vesanto, Olli Simula, Petri Vasara
Type: Conference article in WSOM2001
Description: Automated interpretation of SOM using rules, 6 pages, 2001
Availability: [PS] (1 MB), [zipped:PS] (428 kB)
- Abstract:
-
The objective of this work was to develop automatic tools for
post-processing of SOMs, especially in the context of hierarchical
data --- data where each higher level object consists of a varying
number of lower level objects. Both low and high level data is
available and needs to be utilized. The information from lower
levels is transferred to higher level using data histograms of lower
level clusters. The clusters are formed and interpreted
automatically so as to summarize the information given by the SOM,
and to produce meaningful indicators that are useful also to problem
domain experts. The results show that the approach works well at
least in the case study of pulp and paper mills technology data.
- Bibtex:
-
@InProceedings{siponen2001wsom,
author = {Markus Siponen, Juha Vesanto, Olli Simula, Petri Vasara},
title = {An Approach to Automated Interpretation of SOM},
booktitle = {Proceedings of Workshop on Self-Organizing Map 2001},
pages = {89--94},
year = 2001,
editor = {Nigel Allinson, Hujun Yin, Lesley Allinson, Jon Slack},
month = {June},
publisher = {Springer}
}
- Importance of Individual Variables in the k-Means Algorithm
-
Authors: Juha Vesanto
Type: Conference article in PAKDD2001
Description: Effect of scaling in k-means, 6 pages, 2001
Availability: [PS] (210 kB), [zipped:PS] (59 kB)
- Abstract:
-
In this paper, quantization errors of individual variables in
k-means quantization algorithm are investigated with respect to
scaling factors, variable dependency, and distribution
characteristics. It is observed that Z-norm standardation limits
average quantization errors per variable to unit range. Two
measures, quantization quality and effective number of quantization
points are proposed for evaluating the goodness of quantization of
individual variables. Both measures are invariant with respect to
scaling/variances of variables. By comparing these measures
between variables, a sense of the relative importance of variables
is gained.
- Bibtex:
-
@InProceedings{vesanto2001pakdd,
author = {Juha Vesanto},
title = {Importance of Individual Variables in the k-Means Algorithm},
booktitle = {Proceedings of the Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining (PAKDD2001)},
pages = {513--518},
year = 2001,
editor = {David Cheung, Graham J.~Williams, Qing Li},
month = {April},
publisher = {Springer}
}
- A SOM Based Cluster Visualization and Its Application for False Coloring
-
Authors: Johan Himberg
Type: Conference article in IJCNN2000
Description: Clustering visualization / false coloring of SOM, 6 pages, 2000
Availability: [zipped:PS] (159 kB), [PS] (6 MB)
- Abstract:
-
The self-organizing map (SOM) is widely used as a data visualization method
in various engineering applications. It performs a non-linear mapping from a
high-dimensional data space to a lower dimensional visualization space. In
this paper, a simple method for visualizing the cluster structure of SOM
model vectors is presented. The method may be used to produce tree-like
visualizations, but the main application here is to get different color
codings that express the approximate cluster structure of the SOM model
vectors. This coloring may be exploited in making false color (pseudo
color) presentations of the original data. The method is especially meant
for making an easily implementable, explorative cluster visualization tool.
- Bibtex:
-
@InProceedings{jhimberg2000ijcnn,
author = {Johan Himberg},
title = {A SOM Based Cluster Visualization and Its Application for False Coloring},
booktitle = {Proceedings of International Joint Conference on
Neural Networks (IJCNN2000)},
pages = {587--592},
volume = {3},
year = {2000}
}
- Neural Network Tool for Data Mining: SOM Toolbox
-
Authors: Juha Vesanto
Type: Conference article
in TOOLMET2000
Description: Computational complexity of SOM Toolbox, 13 pages, 2000
Availability: [zipped:PS] (189 kB), [PS(corrected)] (274 kB), [PS(original)] (256 kB)
- Notes:
- SOM Toolbox website.
- Abstract:
-
Self-Organizing Map is an unsupervised neural network which combines
vector quantization and vector projection. This makes it a powerful
visualization tool. SOM Toolbox implements the SOM in the Matlab 5
computing environment. In this paper, computational complexity of
SOM and the applicability of the Toolbox are investigated. It is
seen that the Toolbox is easily applicable to small data sets (under
10000 records) but can also be applied in case of medium sized data
sets. The prime limiting factor is map size: the Toolbox is mainly
suitable for training maps with 1000 map units or less.
- Bibtex:
-
@InProceedings{vesanto2000toolmet,
author = {Juha Vesanto},
title = {Neural Network Tool for Data Mining: SOM Toolbox},
booktitle = {Proceedings of Symposium on Tool Environments
and Development Methods for Intelligent Systems (TOOLMET2000)},
pages = {184--196},
year = {2000},
publisher = {Oulun yliopistopaino},
address = {Oulu, Finland}
}
- SOM Based Analysis of Pulping Process Data
-
Authors: Olli Simula and Esa Alhoniemi
Type: Conference article
in IWANN'99
Description: process analysis, 1999
Availability: [zipped:PS] (375 kB), [PS] (2 MB)
- Abstract:
-
Data driven analysis of complex systems or processes is necessary in
many practical applications where analytical modeling is not
possible. The Self-Organizing Map (SOM) is a neural network
algorithm that has been widely applied in analysis and visualization
of high-dimensional data. It carries out a nonlinear mapping of
input data onto a two-dimensional grid. The mapping preserves the
most important topological and metric relationships of the data.
The SOM has turned out to be an efficient tool in data
exploration tasks in various engineering applications: process
analysis in forest industry, steel production and analysis of
telecommunication networks and systems. In this paper, SOM based
analysis of complex process data is discussed. As a case study,
analysis of a continuous pulp digester is presented. The SOM is used
to form visual presentations of the data. By interpreting the
visualizations, complex parameter dependencies can be revealed. By
concentrating on the significant measurements, reasons for digester
faults can be determined.
- Bibtex:
-
@InProceedings{simula99iwann,
author = {Olli Simula and Esa Alhoniemi},
title = {{SOM Based Analysis of Pulping Process Data}},
booktitle = {Proceedings of International Work-Conference
on Artificial and Natural Neural Networks (IWANN '99)},
pages = {567--577},
year = {1999},
volume = {II},
publisher = {Springer},
annote = {}
}
- Self-Organizing Map in Matlab: the SOM Toolbox
-
Authors: Juha Vesanto, Johan Himberg, Esa Alhoniemi and Juha Parhankangas
Type: Conference article
in MATLAB-DSP 1999
Description: SOM Toolbox 2.0, 6 pages, 1999
Availability: [zipped:DOC] (89 kB), [DOC] (133 kB)
- Notes:
- SOM Toolbox website
- Abstract:
-
The self-organizing map (SOM) is a vector quantization method
which places the prototype vectors on a regular low-dimensional
grid in an ordered fashion. This makes the SOM a powerful visualization tool.
The SOM Toolbox is an implementation of the SOM and its visualization
in the Matlab 5 computing environment. In this article, the SOM Toolbox
and its usage are shortly presented. Also its performance in terms
of computational load is evaluated and compared to a corresponding
C-program.
- Bibtex:
-
@InProceedings{vesanto99matlab,
author = {Juha Vesanto and Johan Himberg and Esa Alhoniemi and Juha Parhankangas},
title = {Self-Organizing Map in Matlab: the SOM Toolbox},
booktitle = {Proceedings of the Matlab DSP Conference 1999},
address = {Espoo, Finland},
year = {1999},
month = {November},
pages = {35-40},
annote = {}
}
- Probabilistic Measures for Responses of Self-Organizing Map Units
-
Authors: Esa Alhoniemi, Johan Himberg and Juha Vesanto
Type: Conference article
in CIMA'99
Description: pdf-estimation using SOM, 1999
Availability: [PS] (2 MB), [zipped:PS] (602 kB)
- Abstract:
-
The self-organizing map (SOM) is a widely used
data visualization tool in engineering applications.
The algorithm performs a non-linear mapping from
a high-dimensional data space to a low-dimensional space, which is
typically a two-dimensional, rectangular grid. This makes it possible
to present multidimensional data in two dimensions.
Often the model vectors of the SOM and a new data sample need to be
compared. The SOM, however, gives no probability measures to
determine, if the sample belongs to data sets determined by map
units. For this purpose a modified batch version of reduced kernel
density estimator (RKDE) was tested. The results
were compared with Gaussian Mixture Model (GMM) and S-Map.
- Bibtex:
-
@InProceedings{alhoniemi99cima,
author = {Esa Alhoniemi and Johan Himberg and Juha Vesanto},
title = {{Probabilistic Measures for Responses of
Self-Organizing Map Units}},
pages = {286--290},
year = {1999},
booktitle = {Proceeding of the International ICSC Congress on Computational
Intelligence Methods and Applications (CIMA '99)},
editor = {H. Bothe and E. Oja and E. Massad and C. Haefke},
publisher = {ICSC Academic Press},
annote = {}
}
- Hunting for Correlations in Data Using the Self-Organizing Map
-
Authors: Juha Vesanto and Jussi Ahola
Type: Conference article
in CIMA'99
Description: correlation hunting, 1999
Availability: [PS] (330 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is an efficient tool for
visualization of multidimensional numerical data. One of the tasks
it is used for is correlation hunting. In this paper we present a
simple method to enhance correlation hunting in the case of a
large number of variables. Different variations of the method -
component plane reorganization - are evaluated on a complex test
data. The purpose is to somewhat validate the use of SOM in
correlation hunting and to evaluate the strengths and weaknesses
of different reorganization procedures. A case with a real world
data is also presented to show the usefulness of the method.
- Bibtex:
-
@InProceedings{vesanto99cima,
author = {Juha Vesanto and Jussi Ahola},
title = {{Hunting for Correlations in Data Using the
Self-Organizing Map}},
pages = {279--285},
booktitle = {Proceeding of the International ICSC Congress on Computational
Intelligence Methods and Applications (CIMA '99)},
year = {1999},
editor = {H. Bothe and E. Oja and E. Massad and C. Haefke},
publisher = {ICSC Academic Press},
annote = {}
}
- Enhancing the SOM based data visualization by
linking different data projections
-
Authors: Johan Himberg
Type: Conference article
in IDEAL'98
Description: SOM visualization, 8 pages, 1998
Availability: [PS] (298 kB), [zipped:PS] (78 kB)
- Abstract:
-
The self-organizing map (SOM) is widely used as a data visualization
method especially in various engineering applications. It performs a
non-linear mapping from a high-dimensional data space to a lower
dimensional visualization space. The SOM can be used for example in
correlation detection and cluster visualization in explorative
manner. In this paper two tools for refing the SOM-based visualization
are presented. The first one brings out a sharper view to the
correlation detection and the second one brings additional information
to the input space distance visualization. Both tools are based on
linking two different data projections using color coding. The tools
are demonstrated using a real world data example from a queuing
system.
- Bibtex:
-
@InProceedings{himberg98ideal,
author = {Johan Himberg},
title = {Enhancing the SOM based data visualization by
linking different data projections},
booktitle = {Proceedings of the International Symposium on
Intelligent Data Engineering and Learning
(IDEAL'98)},
address = {Hong Kong},
year = {1998},
month = {October},
pages = {427--434},
annote = {}
}
- Enhancing SOM based data visualization
-
Authors: Juha Vesanto, Johan Himberg, Markus Siponen and Olli Simula
Type: Conference article in
IIZUKA'98
Description: SOM visualization, 4 pages, 1998
Availability: [HTML] (4 kB), [PS] (728 kB), [zipped:PS] (152 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is an effective data exploration tool.
One of the reasons for this is that it is conceptually very simple
and its visualization is easy. In this paper, we propose new ways
to enhance the visualization capabilities of the SOM in three areas:
clustering, correlation hunting, and novelty detection. These
enhancements are illustrated by various examples using real-world
data.
- Bibtex:
-
@InProceedings{vesanto98iizuka,
author = {Juha Vesanto and Johan Himberg and Markus Siponen
and Olli Simula},
title = {Enhancing SOM based data visualization},
booktitle = {Proceedings of the International Conference on
Soft Computing and Information/Intelligent Systems
(IIZUKA'98)},
address = {Iizuka, Japan},
month = {October},
year = {1998},
pages = {64--67},
annote = {}
}
- Analysis of Industrial Systems Using the Self-Organizing Map
-
Authors: Olli Simula, Juha Vesanto and Petri Vasara
Type: Conference article in
KES'98
Description: industry analysis, 8 pages, 1998
Availability: [zipped:DOC] (102 kB), [DOC] (672 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is a neural network algorithm which is
especially suitable for the analysis and visualization of
high-dimensional data. It maps nonlinear statistical relationships
between high-dimensional input data into simple geometric
relationships, usually on a two-dimensional grid. The mapping roughly
preserves the most important topological and metric relationships of
the original data elements and, thus, inherently clusters the
data. The need for visualization and clustering occurs in various
engineering applications, in the analysis of complex processes or
systems. In addition, SOM allows easy data fusion enabling
visualization and analysis of large data bases of industrial
systems. As a case study, the SOM has been used to cluster the pulp
and paper mills of the world.
- Bibtex:
-
@InProceedings{simula98kes,
author = {Olli Simula and Juha Vesanto and Petri Vasara},
title = {Analysis of Industrial Systems Using the
Self-Organizing Map},
booktitle = {Proceedings of the Internationa Conference on
Knowledge-based Intelligent Systems (KES'98)},
address = {Adelaide, Australia},
year = {1998},
month = {April},
volume = {1},
pages = {61--68},
annote = {}
}
- Integrating environmental, technologigal and
financial data in forest industry analysis
-
Authors: Juha Vesanto, Petri Vasara, Riina-Riitta Helminen and Olli Simula
Type: Conference article in
SNN'97
Description: industry analysis, 4 pages
Availability: [PS] (5 MB), [zipped:PS] (171 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network method
for the analysis and visualisation of high-dimensional data.
In this paper, the SOM algorithm is applied to the analysis of the
technology of world paper and pulp industry. It is seen that the
method can be used on environmental, technological and financial data
to produce a comprehensive view of the industry as a whole.
- Bibtex:
-
@InProceedings{vesanto97snn,
author = {Juha Vesanto and Petri Vasara and Riina-Riitta
Helminen and Olli Simula},
title = {Integrating environmental, technologigal and
financial data in forest industry analysis},
booktitle = {Proceedings of Stichting Neurale Netwerken
Conference (SNN'97)},
address = {Amsterdam, Netherlands},
year = {1997},
month = {May},
pages = {153--156},
annote = {}
}
- Analysis of Complex Systems Using the Self-Organizing Map
-
Authors: Olli Simula, Esa Alhoniemi, Jaakko Hollmén and Juha Vesanto
Type: Conference article in
ICONIP'97
Description: process analysis, 5 pages, 1997
Availability: [PS] (198 kB), [zipped:PS] (66 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network method
for the analysis and visualization of high-dimensional data. It maps
nonlinear statistical relationships between high-dimensional input
data into simple geometric relationships on a usually
two-dimensional grid. The mapping roughly preserves the most
important topological and metric relationships of the original data
elements and, thus, inherently clusters the data. The need for
efficient data visualization and clustering is often faced, for
instance, in the analysis of various engineering problems. In this
paper, the use of the SOM based methods in analysis, monitoring and
modeling of complex industrial processes is discussed.
- Bibtex:
-
@InProceedings{simula97iconip,
author = {Olli Simula and Esa Alhoniemi and Jaakko {Hollm\'en}
and Juha Vesanto},
title = {Analysis of Complex Systems Using the
Self-Organizing Map},
booktitle = {Proceedings of the International Conference on
Neural Information Processing and Intelligent
Information Systems (ICONIP'97)},
year = {1997},
pages = {1313--1317},
annote = {}
}
- Using the SOM and Local Models in Time-Series Prediction
-
Authors: Juha Vesanto
Type: Conference article in WSOM'97
Description: time-series prediction, 6 pages, 1997
Availability: [PS] (172 kB), [zipped:PS] (65 kB)
- Abstract:
-
In this paper we test the Self-Organizing Map (SOM) on the problem
of predicting chaotic time-series (specifically Mackey-Glass series)
with local linear models defined separately for each of the
prototype vectors of the SOM. We see that the method achieves good
results. This together with the capabilities of the SOM make it a
valuable tool in exploratory data mining.
- Bibtex:
-
@InProceedings{vesanto97wsom,
author = {Juha Vesanto},
title = {Using the SOM and Local Models in Time-Series Prediction},
booktitle = {Proceedings of Workshop on Self-Organizing
Maps (WSOM'97)},
address = {Espoo, Finland},
year = {1997},
month = {June},
pages = {209--214},
annote = {}
}
- Analyzing an Automatic Call Distribution System
using the Self-Organizing Map
-
Authors: Johan Himberg and Olli Simula
Type: Conference article in FINSIG'97
Description: phone service data analysis, 1997
Availability: not available
- Bibtex:
-
@InProceedings{Himberg97,
author = {Johan Himberg and Olli Simula},
title = {Analyzing an Automatic Call Distribution System
using the Self-Organizing Map},
pages = {153--157},
booktitle = {Proceedings of 1997 Finnish Signal Processing Symposium
(FINSIG'97)},
address = {Pori, Finland},
year = {1997},
month = {May},
annote = {}
}
- Monitoring and modeling of complex processes using hierarchical self-organizing maps
-
Authors: Olli Simula, Esa Alhoniemi, Jaakko Hollmén and Juha Vesanto
Type: Conference article in ISCAS'96
Description: process analysis, 4 pages, 1996
Availability: [PS] (1 MB), [zipped:PS] (86 kB)
- Abstract:
-
In this paper, a neural network based analysis method for monitoring
and modeling the dynamic behavior of complex industrial processes is
considered. The method is based on the unsupervised learning
property of the Self-Organizing Map (SOM) algorithm. The time series
produced by several sensors measuring the process parameters as well
as other process data are used in mapping the process behavior and
dynamics into the network.
- Bibtex:
-
InProceedings{Simula96,
author = {Olli Simula and Esa Alhoniemi and Jaakko {Hollm\'en}
and Juha Vesanto},
title = {Monitoring and modeling of complex processes using
hierarchical self-organizing maps},
booktitle = {Proceedings of the {IEEE} International Symposium
on Circuits and Systems (ISCAS'96)},
volume = {Supplement},
year = {1996},
month = {May},
pages = {73--76},
annote = {}
}
- Prediction Models and Sensitivity Analysis of
Industrial Production Process Parameters by Using
the Self-Organizing Map
-
Authors: Jaakko Hollmén and Olli Simula
Type: Conference article in NORSIG'96
Description: process analsis, 4 pages, 1996
Availability: not available
- Bibtex:
-
@InProceedings{Hollmen96,
author = {Jaakko {Hollm\'en} and Olli Simula},
title = {Prediction Models and Sensitivity Analysis of
Industrial Production Process Parameters by Using
the Self-Organizing Map},
booktitle = {Proceedings of {IEEE} Nordic Signal Processing Symposium
(NORSIG'96)},
year = {1996},
pages = {79--82},
annote = {}
}
Presentations, other...
- The SOM in data mining: analysis of world pulp and paper technology
-
Authors: Juha Vesanto
Type: Presentation in a
Workshop
in SCIA'97
Description: industry analysis, SOM software, 10 pages, 1997
Availability: [PS] (2 MB), [zipped:PS] (123 kB)
- Abstract:
-
The Self-Organizing Map (SOM) is a powerful neural network method
for the analysis and visualisation of high-dimensional data. In the
Entire project, a data mining tool using the SOM was implemented and
used to analyse world pulp and paper technology.
- Bibtex:
-
@Unpublished{Vesanto97,
author = {Juha Vesanto},
title = {The SOM in data mining: analysis of world pulp
and paper technology},
note = {Presentation in SCIA'97.},
year = {1997},
annote = {}
}
Thesis
- Using SOM in Data Mining
-
Authors: Juha Vesanto
Type: Licentiate's thesis
Description: An overview of data mining process and using SOM in it, 57 pages, 2000
Availability: [zipped:PS,PDF] (902 kB), [PS] (7 MB), [PDF] (4 MB)
- Notes:
- The thesis consists of the introduction (given here) and three publications:
- Probabilistic Measures for Responses of Self-Organizing
Map Units in Proceeding of the International ICSC Congress on Computational
Intelligence Methods and Applications (CIMA '99)
- SOM-Based Data Visualization Methods in
Intelligent Data Analysis
- Clustering of the Self-Organizing Map
in Transactions on Neural Networks (to be published)
- Abstract:
-
Data mining as a research area answers to the challenge of analysing
large databases in commerce, industry, and research. The purpose is
to find new knowledge from databases where the dimensionality,
complexity, or amount of data is prohibitively large for manual
analysis. Data mining is an interactive process requiring that the
intuition and background knowledge of application experts are
coupled with the computational efficiency of modern computer technology.
The Self-Organizing Map (SOM) is one of the most popular neural
network models. The SOM quantizes the data space formed by the
training data and simultaniously performs a topology-preserving
projection of the data onto a regular low-dimensional grid. The grid
can be used efficiently in visualization.
This thesis consists of an introduction and three publications. In
the introduction, an overview of each step of the data mining
process is first presented, primarily based on the CRoss-Industry
Standard Process model for Data Mining (CRISP-DM). Then the SOM
algorithm and some of its variants are introduced, and the use of
SOM in data mining is discussed. The publications deal with
modeling, visualization and clustering of data using the SOM. In
addition, the introduction discusses the use of SOM in summarization.
The SOM is especially suitable for data understanding, but it is a
robust tool suitable for modeling and preparation of data as well.
It offers a convenient workbench which helps in gaining an initial
understanding of the data at hand, and it can be used for
creating some initial models as well.
Keywords: Self-Organizing Map, data mining, knowledge
discovery in databases, visualization, clustering, summarization,
data survey
- Bibtex:
-
@Booklet{vesanto2000licentiate,
title = {Using SOM in Data Mining},
author = {Juha Vesanto},
howpublished = {Licentiate's thesis in the Helsinki University of Technology},
month = {April},
year = {2000},
annote = {}
}
- Prosessin mittauksiin perustuva sulfaattisellun jatkuvatoimisen
keiton analyysi
-
Authors: Esa Alhoniemi
Type: Licentiate's thesis
Description: analysis of kraft pulping process, 88 pages, 1998
Availability: not available
- Notes:
- In finnish.
- Bibtex:
-
@Booklet{alhoniemi98licentiate,
title = {Prosessin mittauksiin perustuva sulfaattisellun
keiton analyysi},
author = {Esa Alhoniemi},
howpublished = {Licentiate's thesis in the Helsinki University of Technology},
month = {August},
year = {1998}
annote = {}
}
- Data Mining for Finding Surface Defects in Steel Strips
-
Authors: Jukka Parviainen
Type: Master's thesis
Description: analysis of defects in steel strips, 59 pages, 2000
Availability: [zipped:PS] (1 MB)
- Abstract:
-
Data mining is a collection of methods which build models that depict
behavior of a system. Data driven methods have been developed lately
when the data processing and storing has become cheap in large
scale. Data mining is user centered, data driven, interactive and
iterative process whose stages are specification, data preparation,
data survey, modeling and deployment.
A hot rolled strip is a steel product. Measurements in the rolling
mill, product ion line are recorded into databases. There are
sometimes surface defects on the strip surface. These originate from
casting, rolling, failed descaling or mechanical touch.
The connection between process parameters and surface defects was
examined in the work. The aim was to find a model for controlling
optimal set-up values in order to avoid surface defects, and
calculate a warning of a possible defect. Self-organizing map (SOM)
was used as a data mining tool.
The work is a result of NEUROLL project and carried out in the
Laboratory of Computer and Information Science in Helsinki University
of Technology. The project partner was Rautaruukki Steel in Raahe.
Keywords: data mining, hot rolled strip, surface quality, self-organizing map
- Bibtex:
-
@MastersThesis{parviainen00master,
author = {Jukka K Parviainen},
title = {Data Mining for Finding Surface Defects in Steel Strips},
school = {Helsinki University of Technology},
year = {2000},
month = {September},
annote = {}
}
- Itseorganisoiva kartta jatkuvatoimisen sinkityslinjan ohjauksessa
-
Authors: Henry Stenberg
Type: Master's thesis
Description: process analysis software, 53 pages, 1998
Availability: not available
- Notes:
- In finnish.
- Bibtex:
-
@MastersThesis{stenberg98master,
author = {Henry Stenberg},
title = {Itseorganisoiva kartta jatkuvatoimisen sinkityslinjan
ohjauksessa},
school = {Helsinki University of Technology},
year = {1998},
month = {April},
annote = {}
}
- Itseorganisoituvaan karttaan perustuva työkalu
ja sen soveltaminen puheludatan analyysiin
-
Authors: Johan Himberg
Type: Master's thesis
Description: phone service data analysis, SOM visualization, 71 pages, 1997
Availability: not available
- Notes:
- In finnish.
- Bibtex:
-
@MastersThesis{himberg97master,
author = {Johan Himberg},
title = {Itseorganisoituvaan karttaan perustuva {ty\"okalu}
ja sen soveltaminen puheludatan analyysiin},
school = {Helsinki University of Technology},
year = {1997},
month = {October},
annote = {}
}
- Data Mining Techniques Based on the Self-Organizing Map
-
Authors: Juha Vesanto
Type: Master's thesis
Description: data mining, industry analysis, SOM software, 63 pages, 1997
Availability: [HTML] (4 kB), [zipped:PS] (1 MB)
- Abstract:
-
Data mining is a part of a larger area of recent research in
artificial intelligence and information management: knowledge
discovery in databases (KDD). The purpose of KDD is to find new
knowledge from databases in which the dimension, complexity or the
amount of data has so far been prohibitively large for human
observation alone. Data mining refers to the exploratory phase of
knowledge discovery.
The Self-Organizing Map (SOM) is one of the most popular neural
network models. The SOM quantizes the data space formed by the
training data and simultaniously performs a topology-preserving
projecting of the data space on a regular two-dimensional grid. The
SOM also has excellent visualization capabilities including techniques
to give an informative picture of the data space, and techniques to
compare data vectors or whole data sets with each other. The SOM can
also be used for clustering, classification and modeling. The
versatile properties of the SOM make it a valuable tool in data mining
and knowledge discovery.
As part of this work a SOM-based data mining tool was implemented. The
methods and tools presented in the work were used to analyze the pulp
and paper industry worldwide and the Scandinavian industry in more
detail with encouraging results. The analysis of technological data
resulted in 20 major types of pulp and paper mills. Regarding
Scandinavian industry a hierarchical structure of SOMs was used to
combine technological, environmental and economical data.
The work has been done in the Laboratory of Computer and Information
Science at the Helsinki University of Technology as part of the
corporate project Entire in the technology program "Adaptive and
Intelligent Systems Applications". The project was financed by Jaakko
Pöyry Consulting and the Technology Development center of Finland
(TEKES).
- Bibtex:
-
@MastersThesis{vesanto97master,
author = {Juha Vesanto},
title = {Data Mining Techniques Based on the
Self-Organizing Map},
school = {Helsinki University of Technology},
year = {1997},
month = {May},
url = {http://www.cis.hut.fi/projects/monitor/publications/html/mastersJV97/},
annote = {}
}
- Process Modeling Using the Self-Organizing Map
-
Authors: Jaakko Hollmén
Type: Master's thesis
Description: process analysis, 50 pages, 1996
Availability: not available
- Bibtex:
-
@MastersThesis{hollmen96master,
author = {Jaakko {Hollm\'en}},
title = {Monitoring of Complex Processes Using the
Self-Organizing Map},
school = {Helsinki University of Technology},
year = {1996},
month = {February},
annote = {}
}
- Monitoring of Complex Processes Using the Self-Organizing Map
-
Authors: Esa Alhoniemi
Type: Master's thesis
Description: process analysis, 50 pages, 1995
Availability: not available
- Bibtex:
-
@MastersThesis{alhoniemi95master,
author = {Esa Alhoniemi},
title = {Monitoring of Complex Processes Using the
Self-Organizing Map},
school = {Helsinki University of Technology},
year = {1995},
month = {December},
annote = {}
}

http://www.cis.hut.fi/projects/ide/publications/fulldetails.shtml
ide@mail.cis.hut.fi
Thursday, 19-Aug-2004 19:33:44 EEST
|