Adaptive Natural Language Processing
Our goal is to learn representations that can be used for the
recognition, understanding and generation of language.
This can be considered to consist of the following
interrelated tasks:
(1) the discovery of elements of representation
(e.g. words, morphemes, phonemes), (2) their meaning relations (syntax and semantics), and
(3) structures or "rules" of their use in natural utterances (syntax and pragmatics).
The research is part of the activities of two research groups,
Multimodal Interfaces and
Computational Cognitive
Systems.
Contents
Morpheme discovery
The goal is to develop unsupervised data-driven methods that carry
out unsupervised morphology induction, that is, discover the
regularities behind word formation in natural languages. For more
information, see the page of the Morpho
project.
Keywords: morphology induction, unsupervised morpheme discovery,
minimum description length principle (MDL), applications of morphemes
in NLP
Selected publications:
- Mathias Creutz and Krista Lagus (2007).
Unsupervised
Models for Morpheme Segmentation and Morphology Learning.
ACM Transactions on Speech and Language
Processing, Volume 4, Issue 1, January 2007.
- Oskar Kohonen, Sami Virpioja, and Mikaela Klami (2009).
Allomorfessor: Towards unsupervised morpheme analysis.
Lecture Notes in Computer Science, 5706. Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008 Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers
Term discovery
Selected publications:
The research on emergent linguistic and cognitive representations
enables computers to deal with semantics: to process data having
certain access to its meaning and eventually to its context of use.
Keywords: Self-organising semantic maps, SOM, Word ICA, Latent
Semantic Analysis (LSA), random mapping (RM), word spaces, conceptual
spaces
More
on emergence
of linguistic representations for words.
Word sense discovery and disambiguation
A specific task in NLP where the representation of word meaning is
important is the twin probem of word sense discovery and word sense
disambiguation. Discovery is the process of uncovering the possible
different meanings for a given word, and disambiguation is the
determination of which meaning is intended in a given instance or
given context of the word.
Selected publications:
-
Linden, K. Evaluation of Linguistic Features for Word Sense
Disambiguation with Self-Organized Document Maps. Computers and
the Humanities, 2004. (December). Keywords: soft clustering,
linguistic features, word-sense disambiguation, document space
- Lindén, K. and Lagus, K. (2002).
Word
Sense Disambiguation in Document Space.
2002 IEEE Int. Conference on Systems, Man and Cybernetics,
Tunisia, October 6-9, 2002. Electronic publication (CD-ROM).
Discovery of constructions
Selected publications:
- Krista Lagus, Oskar Kohonen, and Sami Virpioja (2009).
Towards unsupervised learning of constructions from text.
In Proceedings of the Workshop on Extracting and Using Constructions
in NLP of the 17th Nordic Conference on Computational Linguistics,
NODALIDA, May 2009. SICS Technical Report T2009:10.
Statistical language modeling
Statistical language modeling is the endeavor for finding models
that can accurately estimate the probabilities of natural language
sequences or utterances. Language models are essential in many a NLP
applications, such as
speech recognition
and
machine translation. Our research concentrates
on efficient ways of representing the relevant probabilities by
applying unsupervised machine learning methods.
Selected publications:
- Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A.,
Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E.,
Saraçlar, M., and Stolcke, A. (2007).
Morph-based speech recognition
and modeling of out-of-vocabulary words across languages. ACM
Transactions on Speech and Language Processing, Volume 5, Issue
1, Dec 2007.
- Vesa Siivola, Teemu Hirsimäki and Sami Virpioja (2007).
On
Growing and Pruning Kneser-Ney Smoothed N-Gram Models.
IEEE Transactions on Audio, Speech and Language Processing,
Volume 15, Issue 5, July 2007, pp. 1617-1624.
- Hirsimäki, T., Creutz, M., Siivola, V., Kurimo,
M., Virpioja, S., and Pylkkönen, J. (2006).
Unlimited
Vocabulary Speech Recognition with Morph Language Models Applied to
Finnish. Computer Speech and Language, Volume 20, Issue
4, October 2006, pp. 515-541.
- Kurimo, M. and Lagus, K. (2002).
An
Efficiently Focusing Large Vocabulary Language
Model. In International Conference on Artificial Neural
Networks (ICANN'02), Madrid, Spain, August 28-30,
2002. pp. 1068-1073.
- WEBSOM - document maps of large text collections
- USIX INTERACT -
Interaction using natural language
Here is a list of courses organized by the people in the NLP area.
Former people
Page maintained by krista at cis.hut.fi,
last updated Friday, 06-Dec-2013 12:44:31 EET