Main Page | Class List | File List | Class Members | File Members

Etree Class Reference

This class defines the Evolving Tree. More...

#include <etree.hh>

List of all members.

Public Member Functions

void Initialize ()
 Initialization.
bool HasTreeGrown ()
 Stopping criteria.
 Etree (DataCache *dc)
 A basic constructor.
void SetParameters (int dT, int dF, double e0, double s0, double t1, double t2, double bdecay)
 Sets the training parameters to the given values.
int FindBMU (const double *datavec) const
 Find the best matching unit.
double CalculateUpdateFactor (const double distance) const
 Calculate the update weight.
void UpdateNodes (int BMU, const double *datavec)
 Moves the fringe nodes towards the data vector.
int TreeDistance (const int node1, const int node2) const
 Calculate the tree distance between nodes.
vector< int > FindTreePath (int nodenum) const
 Find path from root to the specified node.
vector< pair< int, int > > GetSortedFringeDistances (const int nodenum) const
 Get tree distances from the given node.
bool SetDataCache (DataCache &newcache)
 Sets a new data cache.
bool IsLeafNode (unsigned int i) const
 Check for leafnodedness.
void SingleIteration (const double *datavec)
 Trains the tree with one data vector.
void SingleRound (bool exportData=false)
 This function calculates one round of the training algorithm.
void DecayBMUcounts ()
 Decrease BMU counters.
void KmeansAdjust ()
 Fine-tune leaf locations.
void TrunkReshape (int nodenum)
 Optimizes locations of trunk nodes.
void ExportChildrenAsMatlab (string varname, ostream &os)
 Export the current tree as a Matlab matrix.
void ExportShapeAsMatlab (ostream &os)
 Prints every non-leaf node's relationship data as plain text.
bool ReadFromDisk (string fname)
 Read tree data from given file.
int SubtreeSize (int node)
int TotalNodes ()
int FringeSize ()
vector< int > NodesAtDepth (int depth)
 Returns the nodes that are at the specified depth in the tree.
unsigned int MaxDepthNode () const
 Returns the index of the deepest node. Ties broken arbitrarily.
vector< int > GetDepthDistribution () const
int GetMaxDepth () const
 Return the path to the deepest node.
void PrintNewickTree (ostream &out, const int curnodenum=0) const
void PrintTreeInfo (ostream &os) const
 Prints some statistics in text form.
void PrintNodeInfo (int nodenum, ostream &os) const
 Print the relationship info of a given node in plain text.
void PrintNodeProto (int nodenum, ostream &os) const
 Print the prototype vector of a given node as plaintext.
void CreateIndex ()
 Builds the index information.
void ClearIndex ()
vector< int > GetIndexLabels (int i) const
vector< int > GetClassification (const double *queryvec) const
 Seeks the best matches to the given query vector.
void PrintIndexStats (ostream &out) const
 Prints the distribution of mapped data vectors on leaf nodes.

Protected Member Functions

void Unserialize (istream &is)
 Extract the tree structure from a binary mess.
void UpdateNode (int curnode, int prevnode, int BMU, const double *datavec, double distance)
 Moves the fringe nodes towards the data vector.

Protected Attributes

vector< Nodenodes
 A container for the nodes.
vector< int > fringe
int dim
 Dimension of data.
double round
 How manyth iteration is going on.
double eta0
 Initial learning-rate decay value.
double sigma0
 Initial neighborhood decay.
double tau1
 A time constant.
double tau2
 A time constant. See Haykin, pages 450-.
double bmuDecay
 A regularization factor.
int divisionThreshold
 When a node is BMU this many times, it is split.
int divisionFactor
 When split, a node gets this many children.
double weightThreshold
 A cutoff threshold for neighbor updating.
double growingThreshold
 A cutoff threshold for tree growing.
int prevTreeSize
 The size of the tree in the previous round.
DataCachedatavecs
 A container for the training vectors.
vector< vector< int > > labels
 A container for data vector indices that have been mapped to nodes.

Friends

class VisualizerData
ostream & operator<< (ostream &os, Etree &outtree)
 Serializes the tree to the stream.
istream & operator>> (istream &is, Etree &intree)
 Reads an Evolving Tree from a binary stream.


Detailed Description

This class defines the Evolving Tree.

The Evolving Tree contains a group of nodes, the training parameters and the functions necessary to train and query the tree.


Constructor & Destructor Documentation

Etree::Etree DataCache dc  ) 
 

A basic constructor.

Sets all values to sensible guesstimates. For better performance, tweak them by hand.

Parameters:
dc A data cache holding the training data.


Member Function Documentation

double Etree::CalculateUpdateFactor const double  distance  )  const
 

Calculate the update weight.

Figure out how much to update a node towards a data vector. Has exponential neighborhood and decay.

Parameters:
distance The tree distance between the BMU and the node to be updated.
Returns:
An update factor, between 0 and 1.

void Etree::CreateIndex  ) 
 

Builds the index information.

This should be called after training, but before querying.

void Etree::DecayBMUcounts  ) 
 

Decrease BMU counters.

Multiplies all the BMU counters of the leaves by regularization factor.

void Etree::ExportChildrenAsMatlab string  varname,
ostream &  os
 

Export the current tree as a Matlab matrix.

The matrix is built from rows of node prototype vectors. Only exports the nodes that are in the fringe.

Parameters:
varname the name of the Matlab matrix to write.

void Etree::ExportShapeAsMatlab ostream &  os  ) 
 

Prints every non-leaf node's relationship data as plain text.

Uses the following format.

numberofelements
numberofchildren
parent
child1
child2
child3
...
parent
child1
...

Parameters:
os The output stream.

int Etree::FindBMU const double *  datavec  )  const
 

Find the best matching unit.

Starting at the root, progresses down along the path until a leaf node is found.

Note that this assumes nodes[0] is the root. Being a leaf node is equivalent to having zero children.

vector< int > Etree::FindTreePath int  nodenum  )  const
 

Find path from root to the specified node.

Returns a vector of integers that tell the path from the root node to the node specified along the tree. The first element is the root and the last element is the node in question.

Parameters:
nodenum Index to the node to be calculated.
Returns:
The path.

vector< int > Etree::GetClassification const double *  queryvec  )  const
 

Seeks the best matches to the given query vector.

Finds the BMU, and returns the labels of the data vectors mapped to it.

Parameters:
queryvec The data vector containing the query, assumed to have the correct dimension.
Returns:
Index list of training vectors that have been mapped to the BMU.

vector< int > Etree::GetDepthDistribution  )  const
 

Returns vector indicating how many nodes are at each level of the tree.

vector< pair< int, int > > Etree::GetSortedFringeDistances const int  nodenum  )  const
 

Get tree distances from the given node.

This function calculates the tree distances between the given node and all fringe nodes. The result is sorted according to distance.

The first element in pair<int, int> is the distance and the second one is the index of the node. This is so that STL's sort can be used directly.

Parameters:
nodenum Index to the node
Returns:
The list of distances.

bool Etree::HasTreeGrown  ) 
 

Stopping criteria.

Check if the tree has grown between rounds. Can be used to stop the training.

Returns:
true if tree has grown enough otherwise false

void Etree::Initialize  ) 
 

Initialization.

Initializes the tree and its root node.

bool Etree::IsLeafNode unsigned int  i  )  const
 

Check for leafnodedness.

Returns:
true if the given node is a leaf node, false otherwise.

void Etree::KmeansAdjust  ) 
 

Fine-tune leaf locations.

Simple k-means optimization. Map all nodes to leafs, and then place the node to their center of mass.

vector< int > Etree::NodesAtDepth int  depth  ) 
 

Returns the nodes that are at the specified depth in the tree.

Depth of 0 means root, 1 means the first level and so on.

Returns:
The node list.

void Etree::PrintIndexStats ostream &  out  )  const
 

Prints the distribution of mapped data vectors on leaf nodes.

Note that you must call CreateIndex before calling this.

bool Etree::ReadFromDisk string  fname  ) 
 

Read tree data from given file.

Datacache must be set to something proper beforehand.

Returns:
false on failure, meaning file not found or data dimensions disagree.

bool Etree::SetDataCache DataCache newcache  ) 
 

Sets a new data cache.

Returns:
true on success and false if tree is non-empty and data vector sizes differ.

void Etree::SingleIteration const double *  datavec  ) 
 

Trains the tree with one data vector.

This takes a datavector and does one iteration of the training algorithm. It also splits the BMU should that be necessary. It does not update the round variable, because one round consists of many iterations.

Parameters:
datavec The data vector to use in training.

void Etree::SingleRound bool  exportData = false  ) 
 

This function calculates one round of the training algorithm.

In a nutshell: shuffle data vectors, run SingleIteration on them. Stop.

Parameters:
exportDate If true, log every step to a file in Matlab form.

int Etree::TreeDistance const int  node1,
const int  node2
const
 

Calculate the tree distance between nodes.

Figuring out the tree distance is quite simple. First we find out the paths from the nodes to root. Then we find the place where they differ. Then we just add the lengths of the remaining paths and subtract one.

Parameters:
node1 Index to the first node.
node2 Index to the second node.
Returns:
The tree distance.

void Etree::TrunkReshape int  nodenum  ) 
 

Optimizes locations of trunk nodes.

A bottom-up recursion that places trunk nodes to the center of mass of their children.

Parameters:
nodenum The index to the node to adjust. Should point to root when called from outside this function.

void Etree::UpdateNode int  curnode,
int  prevnode,
int  BMU,
const double *  datavec,
double  distance
[protected]
 

Moves the fringe nodes towards the data vector.

Recursively goes through the tree and updates weights of the leaf nodes.

Parameters:
curnode A current node being examined.
prevnode A previously examined node
BMU Index of the best matching unit
datavec The data vector to update towards.
distance A number of hops between the current node and the BMU

void Etree::UpdateNodes int  BMU,
const double *  datavec
 

Moves the fringe nodes towards the data vector.

Updates the fringe node locations towards the data vector.

Parameters:
BMU Index of the best matching unit
datavec The data vector to update towards.


Friends And Related Function Documentation

ostream& operator<< ostream &  os,
Etree outtree
[friend]
 

Serializes the tree to the stream.

Note that datacache is not exported. Also no magic numbers or checksums are used.

The order is as follows. First the constants in the order they appear in the class declaration. Next comes the fringe, then the serialized nodes.

Parameters:
os The output stream.
outtree The tree to output

istream& operator>> istream &  is,
Etree intree
[friend]
 

Reads an Evolving Tree from a binary stream.

No checking of any kind is done.

Parameters:
is The input stream.
intree The tree to write the data to.


Member Data Documentation

vector<int> Etree::fringe [protected]
 

Indices to bottom layer nodes. Only these may be BMUs.


The documentation for this class was generated from the following files:
Generated on Wed May 17 15:43:41 2006 for The Evolving Tree by  doxygen 1.4.1