There were 149 Scandinavian pulp and paper mills included in the mill data set. From the 35 variables included in the mill-specific data set 19 were selected for this study: total pulp and paper production capacities, number of paper machines and coaters and 15 figures of the portions of different pulp and paper types from the total production.
The environmental data set consisted of 15 components. The first two corresponded to total sulphur (SOx) and nitrogen (NOx) emissions per year, four to environmental loads from pulp production, three from energy production and six from effluent discharges. It should be noted that the measurements were not necessarily from a single mill. In case of several mills located in the same place the original measurements of environmental load were a sum over all of them. In such cases the load was divided evenly among the mills, which may not be accurate in reality.
The economical data set consisted of 11 financial variables of 19 Scandivian companies. The variables were: sales, earnings before interests and taxes (EBIT) and fixed investments in US dollars, forest industry share of sales, pulp and paper industry share of sales, sales growth, net indebtedness per sales, equity per total assets, debt per equity, return on capital employed (ROCE) and capital turnover. These are partially the same as used by Vanharanta et al . A problem with the economical data set was how to relate it to the individual paper mills: each company could own several mills and there was no way to know what kind of impact a given mill had on the financial situation of the mother company. The problem was evaded by assigning the data vector of a company to each of its mills unmodified. This way the financial information regarding a mill reflected the economical backing for the mill by its owner company rather than the financial situation of the mill itself.
Of the three data sets used in the study the technological data set was by far the best. There was no missing information and the data was highly reliable. The other data sets were in much worse condition. The environmental data set was of poor quality: only 83% of the Scandinavian mills had any associated environmental information at all and from only 17 mills all component values were known. The problem with economical data set was its small size. Also the fact that the data did not reflect the financial situation of the mill itself but its mother company has probably deteriorated the results somewhat.