Next: About this document ... Up: No Title Previous: References

SUMMARY OF THE PUBLICATIONS

In Publication 1 a complete HMM training method is described, where the codebooks formed with SOM and LVQ are used to initialize the state output densities of semi-continuous HMMs (SCHMMs). A single large SOM is first trained using unsegmented speech data. Then the codebook is labeled using phoneme labels selected by majority voting in segmented speech samples. The segmented speech data is used for LVQ training to improve the phoneme classification accuracy. The trained codebook is then used as an initialization of the mixture densities of the SCHMMs and the final parameters are trained by the Baum-Welch re-estimation. The experimental results show that using the LVQ for the initialization provides better final error rates on the average. The experimented error corrective tuning method decreases the average error rates for some HMM configurations as well.

Publication 2 presents descriptions and comparisons of the versions of the LVQ-based corrective tuning method for both the CDHMMs and SCHMMs. According to the experiments the best version was to use LVQ2 learning law with respect to the segmentation based on the most probable path visiting the correct phonemes in the correct order. Experimental results are also given for varying the number of the best-matching mixture Gaussian components selected to approximate the state output probabilities. The optimal number as a fraction of the total number of Gaussians is proposed for both the CDHMMs and SCHMMs using the average error rate.

Publication 3 describes a version of the Gaussian MDHMMs, where the density codebooks are tied phoneme-wise, and motivates the construction by both training and performance aspects. The application of LVQ to the codebook training is presented and some analysis of the codebooks are provided based on different classification statistics to see the effects of SOM and LVQ training. The proposed models are compared to both CDHMMs and SCHMMs with respect to the number of parameters, the recognition time and the average error rate.

In Publication 4 the segmental LVQ3 training method is presented for the MDHMMs. The equations are given to compute the batch adjustments for the mixture weights and mean vectors. It is shown experimentally that the algorithm gives at least as good results as the previously used combination of the maximum likelihood training and the corrective tuning, but it requires considerably fewer training epochs. Experiments are also reported on how the recognition accuracy of the current system with optimized configuration for online operation could be improved by increasing the number of Gaussians in the mixtures and the dimensionality of the features by applying context vectors.

The objective of Publication 5 is to evaluate ways to improve the recognition speed without losing much in the accuracy. The methods exploit the topological structure in the density codebooks obtained by using training methods based on the SOM. In addition to the advanced and approximative search methods based on the normal SOM, also incorporation of the tree-search SOM to the MDHMMs is studied. To maintain the topology for the MDHMMs a segmental SOM algorithm is described. The effects of the search boosting methods are compared using the average error rate on the testing database and the average recognition speed statistics.

Publication 6 includes a short introduction and motivation to the development of the training methods for MDHMMs. The segmental LVQ3 is described and compared to the segmental K-means, the corrective LVQ2 tuning and the segmental GPD. The use of two statistical tests to compare the recognition results and measure the average training length requirements is presented. The baseline recognition system is described and experiments covering training modifications, speed enhancements and feature vector extensions are reported on two different databases.

The contribution of the author in Publication 5 was the development of the ideas, doing the experiments and writing the paper. The coauthor was responsible for the implementation of the tree-search SOM for the SOM_PAK environment and preliminary experiments to find out a suitable tree structure. The adaptation of the tree-search SOM to the MDHMM framework was done by the author. The HMM based ASR system used in all of the publications and the online prototype system was prepared by the author following the guidelines of the long ASR traditions in the Laboratory of Information and Computer Science.

Finally, the following simplified list of the novelties of this thesis is given to distinguish the author's own contributions from the ideas derived elsewhere:

Using SOM and LVQ to initialize the mixture densities for various HMM structures.
Development of the segmental MDHMM training algorithms based on SOM and LVQ3 to benefit from the preserved local topology and phoneme differentiation in the Viterbi training, respectively.
Development of a simple corrective tuning method based on LVQ2 for the mixture densities.
Using the SOM codebook structures to enable fast density approximations for the states of the MDHMMs.
Development of the phoneme-wise tied mixture density codebooks with a large pool of identically shaped Gaussian mixtures.
Evaluation of the proposed ASR system with MDHMM training assisted by ANN methods for the Finnish speech data.

Next: About this document ... Up: No Title Previous: References

Mikko Kurimo
11/7/1997