SOM Toolbox Online documentation http://www.cis.hut.fi/projects/somtoolbox/

som_read_data

Purpose

 Reads data from an ascii file in SOM_PAK format.

Syntax

Description

 This function is offered for compatibility with SOM_PAK, a SOM software
 package in C. It reads data from a file in SOM_PAK format.

 The SOM_PAK data file format is as follows. The first line must
 contain the input space dimension and nothing else. The following
 lines are comment lines, empty lines or data lines. Unlike programs
 in SOM_PAK, this function can also determine the input dimension
 from the first data lines, if the input space dimension line is
 missing.  Note that the SOM_PAK format is not fully supported: data
 vector 'weight' and 'fixed' properties are ignored (they are treated
 as labels).

 Each data line contains one data vector and its labels. From the beginning
 of the line, first are values of the vector components separated by
 whitespaces, then labels also separated by whitespaces. If there are
 missing values in the vector, the missing value marker needs to be
 specified as the last input argument ('NaN' by default). The missing
 values are stored as NaNs in the data struct. 

 Comment lines start with '#'. Comment lines as well as empty lines are
 ignored, except if the comment lines that start with '#n' or '#l'. In that
 case the line should contain names of the vector components or label names
 separated by whitespaces.

 NOTE: The minimum value Matlab is able to deal with (realmax)
 should not appear in the input file. This is because function sscanf is
 not able to read NaNs: the NaNs are in the read phase converted to value
 realmax.

Required input arguments

  filename    (string) input filename

Optional input arguments

  dim         (scalar) input space dimension
  missing     (string) string used to denote missing components (NaNs); 
                       default is 'NaN'

Output arguments

  sD   (struct) the resulting data struct

Examples

 The basic usage is:
  sD = som_read_data('system.data');

 If you know the input space dimension beforehand, and the file does
 not contain it on the first line, it helps if you specify it as the
 second argument: 
  sD = som_read_data('system.data',9);

 If the missing components in the data are marked with some other
 characters than with 'NaN', you can specify it with the last argument: 
  sD = som_read_data('system.data',9,'*')
  sD = som_read_data('system.data','NaN')

 Here's an example data file:

 5
 #n one two three four five
 #l ID
 10 2 3 4 5 1stline label
 0.4 0.3 0.2 0.5 0.1 2ndline label1 label2
 # comment line: missing components are indicated by 'x':s
 1 x 1 x 1 3rdline missing_components
 x 1 2 2 2 
 x x x x x 5thline emptyline

See also

som_write_data Writes data structs/matrices to a file in SOM_PAK format.
som_read_cod Read a map from a file in SOM_PAK format.
som_write_cod Writes data struct into a file in SOM_PAK format.
som_data_struct Creates data structs.