Component scaling is a linear transformation that is performed for each vector component i separately:
where is the original value of component i of
the data vector
,
is an offset and
is a scaling
factor. Component scaling is used to ensure that no component has
excessive influence on the learning results just because of its
greater variance or bigger absolute value. Typically each component of
the training data set is scaled so that its mean becomes zero and
variance one. Another option is to scale each component so that its
minimum and maximum values in the data set become zero and one,
respectively. The user can use component scaling to control the
relative importance of different data components in training.
An important property of component scaling is that it is reversible. To the user of the data mining tool, the scaled values of the data components hardly say as much as the original values. ENTIRE keeps track of the scaling factors and offsets applied to vector components and can thus reverse the preprocessing and report the original, unscaled data values to the user. When using non-linear preprocessing methods this is not always possible.