The Evolving Tree package comes with several helpful scripts. This page tells how they are used.
This script normalizes a given data file. It operates on each
dimension independently removing the mean and normalizing the variance
to one. It is run like this: data_normalizer.py
datafile.dat
and it writes its output to file
datafile.norm.dat
.
This script whitens the data using the Karhunen-Loeve transform.
The actual calculations are done with the Numerical Python library,
which must be installed to use this script. It is run just like the
previous script: data_whiter.py datafile.dat
and the
output goes to datafile.white.dat
.
This script randomizes the order of data vectors in a file. This is
useful, since data files are usually ordered, which may cause biases.
Running the script is simple: data_randomizer.py
datafile.dat
. The output can be found in
datafile.rand.dat
.
ETree does not care about the data vector labels. Some other software
packages do. A common requirement is that all data vectors should have a
unique label. Many times the label is the class, which is not unique.
This script converts a data file datafile.dat
with non-unique
labels into one with unique running IDs. This file is called
datafile.uniq.dat
. It also creates a file
datafile.family
that maps the created labels to the original
labels.
This script calculates the classes and their sizes of a given data
file. statcounter.py data.dat
results in
data.dat.stats
, which contains this information. Usually
you should not need to run this script, since the cross-validation
program executes this when necessary.
These scripts are used for calculating cross-validation results on a given data set. They have their own documentation page.
Copyright 2004 Jussi Pakkanen, Laboratory of Computer and Information Science, Helsinki University of Technology.