Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology

Randomization Methods for Assessing Data Mining Results on Matrices

This page contains the datasets and the implementations of the methods described in the following papers:
Markus Ojala:
Assessing Data Mining Results on Matrices with Randomization.
In ICDM'10: Proceeding of the 10th IEEE International Conference on Data Mining, pp. 959-964.
Markus Ojala, Niko Vuokko, Aleksi Kallio, Niina Haiminen, Heikki Mannila:
Randomization Methods for Assessing Data Analysis Results on Real-Valued Matrices.
In Statistical Analysis and Data Mining, 2(4):209-230, 2009.

In these papers, a data mining result is considered to be interesting, if it is not explained by the row and column value distributions. Here, we give randomization methods to produce random matrices approximately sharing the row and column statistics with the original matrix.

Updates

Datasets

All generated, artificial datasets in a zip archive (SAM paper): Links to pages where the real datasets used in the experiments can be downloaded:

Implementations

The randomization methods are implemented in Java 1.5. The methods are integrated with Matlab, thus version 1.5 is required from Java virtual machine that Matlab is using. If you have a Matlab version older than 7.5, i.e, 2007b, Matlab has to be changed to use newer JVM, see Matlab support for more help. To increase the heap space for the JVM, see Matlab support.

SwapConstrained implementation (newer, use this if no special reasons for GeneralMetropolis)

In the ICDM 2010, a new SwapConstrained method was given that needs no manual tuning of parameters and can support matrices containing:

To use the methods, download and unzip the following archive, see README.txt and call "help swap" and "help discretize" in Matlab to start using the methods.

SAM implementations (less versatile SwapDiscretized, includes GeneralMetropolis)

In this paper, methods for randomizing real-valued matrices with similar features were given. To use the methods, download and unzip the following archives, start Matlab and call "help randomizeMatrix" for more information. Consult also the SAM article for reference.


You are at: CIS → Randomization Methods for Assessing Data Mining Results on Matrices

Page maintained by Markus.Ojala at tkk.fi, last updated Monday, 20-Dec-2010 08:30:25 EET