Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology
Morpheme lattice

Morfessor demonstration

This demonstration produces a morph segmentation of words you type in. The program has several segmentation models and corpora from which you can choose one. In order to run larger scale experiments on your own data, download the Morfessor software.


Write words to analyze:

          Instructions

Select model type and training corpus:

Model typeCorpus

Categories-ML:

Probabilistic morph segmentation model that learns morph categories (prefix, stem, and suffix) and sequential dependencies between these. This produces a better segmentation than the baseline model, when evaluated against a morphological "gold-standard" segmentation.

Baseline:

Morph segmentation model inspired by the Minimum Description Length principle.

More on methods

Finnish:

1 million word forms extracted from a 16 million word corpus

68 000 word forms extracted from a 250 000 word corpus

English:

110 000 word forms extracted from a 12 million word corpus

21 000 word forms extracted from a 250 000 word corpus

Swedish:

90 000 word forms extracted from a 1 million word corpus

By varying the size of the corpus, you can observe the effects on the resulting segments (for some languages).

Admin (restricted)

You are at: CISResearchMultimodal InterfacesNatLang group → Morpho project

Page maintained by morpho at mail.cis.hut.fi, last updated Thu Jun 4 13:00:49 2009