In this demonstration, words given by the user are analyzed using a segmentation model learned from a text file. After analysis you can make some searches on the model.
First, you are asked to write words to analyze. You can write several words (separated by whitespace) in the input area. The only characters that matter are dependent on the used language and model -- usually those are at least language's own alphabets including apostrophes. Any other characters are treated as whitespace and separate words from each other. For example, if you write "Håvard didn't get those $50 yesterday!" and use English corpus, the actual input will be the words "h", "vard", "didn't", "get", "those" and "yesterday".
If you enter more words than the program can handle (about 10 kilobytes) you will be told something like: Request to receive too much data: XXX bytes. If so, please press your browser's back button and try with fewer words. You can clear the input area by pressing the Clear button.
Next, you choose which of the segmentation models -- the predefined source texts and segmentation methods -- is used to segment your words. The model type (that is the segmentation method) and the corpus (that is the source text) are selected separately. Make your selection and press the Analyze button. (It's a good idea to use a model of the same language as your input words, though!)
When the analysis is completed you will see the segmentation that the model produces for your words. Each segmented word is followed by a list of the morphs it contains together with the frequencies of these morphs in the segmentation model used. In category models, the categories are shown with different colors and text decorations.
If you want to see more information on the model used, you can either search for words similar to the ones you gave, or find all words containing a particular morph. In the segmentation result page (descripted above), words having more than two letters and all morphs in the training data are shown as links.
To find words in the model that resemble a segmented word, click on the word itself. A list of words starting with the same letters as the word you chose will show up. (The number of letters used in the search for similar words depends on the length of the selected word. For words of at least eight letters, half of the length of the word is used.) To find words in the model that contain a morph used in the segmentation, click on it.
If there are more than 500 words or morphs that fit your search query, only the first 500 are shown and the rest can be observed by using the "Next words" and "Previous words" links at the top of the page. If you want back to the segmentation result page, use your browsers back button until you get there. If you want a new input form use "Back to input form" link at the bottom of the page, or "Try the demo" on the sidebar.
You are at: CIS → Research → Multimodal Interfaces → NatLang group → Morpho project
Page maintained by morpho at mail.cis.hut.fi, last updated Thu Jan 5 17:31:48 2012