SOM Toolbox | Online documentation | http://www.cis.hut.fi/projects/somtoolbox/ |

Initialize, apply and undo normalizations on a given vector of scalar values.

xnew = som_norm_variable(x,method,operation) xnew = som_norm_variable(x,sNorm,operation) [xnew,sNorm] = som_norm_variable(...)

This function is used to initialize, apply and undo normalizations on scalar variables. It is the low-level function that upper-level functions SOM_NORMALIZE and SOM_DENORMALIZE utilize to actually (un)do the normalizations. Normalizations are typically performed to control the variance of vector components. If some vector components have variance which is significantly higher than the variance of other components, those components will dominate the map organization. Normalization of the variance of vector components (method 'var') is used to prevent that. In addition to variance normalization, other methods have been implemented as well (see list below). Usually normalizations convert the variable values so that they no longer make any sense: the values are still ordered, but their range may have changed so radically that interpreting the numbers in the original context is very hard. For this reason all implemented methods are (more or less) revertible. The normalizations are monotonic and information is saved so that they can be undone. Also, the saved information makes it possible to apply the EXACTLY SAME normalization to another set of values. The normalization information is determined with 'init' operation, while 'do' and 'undo' operations are used to apply or revert the normalization. The normalization information is saved in a normalization struct, which is returned as the second argument of this function. Note that normalization operations may be stacked. In this case, normalization structs are positioned in a struct array. When applied, the array is gone through from start to end, and when undone, in reverse order. method description 'var' Variance normalization. A linear transformation which scales the values such that their variance=1. This is convenient way to use Mahalanobis distance measure without actually changing the distance calculation procedure. 'range' Normalization of range of values. A linear transformation which scales the values between [0,1]. 'log' Logarithmic normalization. In many cases the values of a vector component are exponentially distributed. This normalization is a good way to get more resolution to (the low end of) that vector component. What this actually does is a non-linear transformation: x_new = log(x_old - m + 1) where m=min(x_old) and log is the natural logarithm. Applying the transformation to a value which is lower than m-1 will give problems, as the result is then complex. If the minimum for values is known a priori, it might be a good idea to initialize the normalization with [dummy,sN] = som_norm_variable(minimum,'log','init'); and normalize only after this: x_new = som_norm_variable(x,sN,'do'); 'logistic' or softmax normalization. This normalization ensures that all values in the future, too, are within the range [0,1]. The transformation is more-or-less linear in the middle range (around mean value), and has a smooth nonlinearity at both ends which ensures that all values are within the range. The data is first scaled as in variance normalization: x_scaled = (x_old - mean(x_old))/std(x_old) and then transformed with the logistic function x_new = 1/(1+exp(-x_scaled)) 'histD' Discrete histogram equalization. Non-linear. Orders the values and replaces each value by its ordinal number. Finally, scales the values such that they are between [0,1]. Useful for both discrete and continuous variables, but as the saved normalization information consists of all unique values of the initialization data set, it may use considerable amounts of memory. If the variable can get more than a few values (say, 20), it might be better to use 'histC' method below. Another important note is that this method is not exactly revertible if it is applied to values which are not part of the original value set. 'histC' Continuous histogram equalization. Actually, a partially linear transformation which tries to do something like histogram equalization. The value range is divided to a number of bins such that the number of values in each bin is (almost) the same. The values are transformed linearly in each bin. For example, values in bin number 3 are scaled between [3,4[. Finally, all values are scaled between [0,1]. The number of bins is the square root of the number of unique values in the initialization set, rounded up. The resulting histogram equalization is not as good as the one that 'histD' makes, but the benefit is that it is exactly revertible - even outside the original value range (although the results may be funny). 'eval' With this method, freeform normalization operations can be specified. The parameter field contains strings to be evaluated with 'eval' function, with variable name 'x' representing the variable itself. The first string is the normalization operation, and the second is a denormalization operation. If the denormalization operation is empty, it is ignored.

x (vector) The scalar values to which the normalization operation is applied. method The normalization specification. (string) Identifier for a normalization method: 'var', 'range', 'log', 'logistic', 'histD' or 'histC'. Corresponding default normalization struct is created. (struct) normalization struct (struct array) of normalization structs, applied to x one after the other (cellstr) of length (cellstr array) first string gives normalization operation, and the second gives denormalization operation, with x representing the variable, for example: {'x+2','x-2}, or {'exp(-x)','-log(x)'} or {'round(x)'}. Note that in the last case, no denorm operation is defined. note: if the method is given as struct(s), it is applied (done or undone, as specified by operation) regardless of what the value of '.status' field is in the struct(s). Only if the status is 'uninit', the undoing operation is halted. Anyhow, the '.status' fields in the returned normalization struct(s) is set to approriate value. operation (string) The operation to perform: 'init' to initialize the normalization struct, 'do' to perform the normalization, 'undo' to undo the normalization, if possible. If operation 'do' is given, but the normalization struct has not yet been initialized, it is initialized using the given data (x).

x (vector) Appropriately processed values. sNorm (struct) Updated normalization struct/struct array. If any, the '.status' and '.params' fields are updated.

To initialize and apply a normalization on a set of scalar values: [x_new,sN] = som_norm_variable(x_old,'var','do'); To just initialize, use: [dummy,sN] = som_norm_variable(x_old,'var','init'); To undo the normalization(s): x_orig = som_norm_variable(x_new,sN,'undo'); Typically, normalizations of data structs/sets are handled using functions SOM_NORMALIZE and SOM_DENORMALIZE. However, when only the values of a single variable are of interest, SOM_NORM_VARIABLE may be useful. For example, assume one wants to apply the normalization done on a component (i) of a data struct (sD) to a new set of values (x) of that component. With SOM_NORM_VARIABLE this can be done with: x_new = som_norm_variable(x,sD.comp_norm{i},'do'); Now, as the normalizations in sD.comp_norm{i} have already been initialized with the original data set (presumably sD.data), the EXACTLY SAME normalization(s) can be applied to the new values. The same thing can be done with SOM_NORMALIZE function, too: x_new = som_normalize(x,sD.comp_norm{i}); Or, if the new data set were in variable D - a matrix of same dimension as the original data set: D_new = som_normalize(D,sD,i);

som_normalize
| Add/apply/redo normalizations for a data struct/set. |

som_denormalize
| Undo normalizations of a data struct/set. |