Evolving Tree helper scripts

The Evolving Tree package comes with several helpful scripts. This page tells how they are used.

data_normalizer.py

This script normalizes a given data file. It operates on each dimension independently removing the mean and normalizing the variance to one. It is run like this: data_normalizer.py datafile.dat and it writes its output to file datafile.norm.dat.

data_whiter.py

This script whitens the data using the Karhunen-Loeve transform. The actual calculations are done with the Numerical Python library, which must be installed to use this script. It is run just like the previous script: data_whiter.py datafile.dat and the output goes to datafile.white.dat.

data_randomizer.py

This script randomizes the order of data vectors in a file. This is useful, since data files are usually ordered, which may cause biases. Running the script is simple: data_randomizer.py datafile.dat. The output can be found in datafile.rand.dat.

data_splitter.py

ETree does not care about the data vector labels. Some other software packages do. A common requirement is that all data vectors should have a unique label. Many times the label is the class, which is not unique. This script converts a data file datafile.dat with non-unique labels into one with unique running IDs. This file is called datafile.uniq.dat. It also creates a file datafile.family that maps the created labels to the original labels.

statcounter.py

This script calculates the classes and their sizes of a given data file. statcounter.py data.dat results in data.dat.stats, which contains this information. Usually you should not need to run this script, since the cross-validation program executes this when necessary.

make_params.py, param_rotate.py, single_cv.py

These scripts are used for calculating cross-validation results on a given data set. They have their own documentation page.


Copyright 2004 Jussi Pakkanen, Laboratory of Computer and Information Science, Helsinki University of Technology.