next up previous contents
Next: Histogram equalization: Up: Data sets Previous: Data sets

Component scaling:

Component scaling is a linear transformation that is performed for each vector component i separately:


where tex2html_wrap_inline3466 is the original value of component i of the data vector tex2html_wrap_inline3156 , tex2html_wrap_inline3472 is an offset and tex2html_wrap_inline3474 is a scaling factor. Component scaling is used to ensure that no component has excessive influence on the learning results just because of its greater variance or bigger absolute value. Typically each component of the training data set is scaled so that its mean becomes zero and variance one. Another option is to scale each component so that its minimum and maximum values in the data set become zero and one, respectively. The user can use component scaling to control the relative importance of different data components in training.

An important property of component scaling is that it is reversible. To the user of the data mining tool, the scaled values of the data components hardly say as much as the original values. ENTIRE keeps track of the scaling factors and offsets applied to vector components and can thus reverse the preprocessing and report the original, unscaled data values to the user. When using non-linear preprocessing methods this is not always possible.

Juha Vesanto
Tue May 27 12:40:37 EET DST 1997