Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology

Adaptive Natural Language Processing

Our goal is to learn representations that can be used for the recognition, understanding and generation of language. This can be considered to consist of the following interrelated tasks: (1) the discovery of elements of representation (e.g. words, morphemes, phonemes), (2) their meaning relations (syntax and semantics), and (3) structures or "rules" of their use in natural utterances (syntax and pragmatics). The research is part of the activities of two research groups, Multimodal Interfaces and Computational Cognitive Systems.


Research topics

1. Discovery of units of representation

Morpho project

Morpheme discovery

The goal is to develop unsupervised data-driven methods that carry out unsupervised morphology induction, that is, discover the regularities behind word formation in natural languages. For more information, see the page of the Morpho project.

Keywords: morphology induction, unsupervised morpheme discovery, minimum description length principle (MDL), applications of morphemes in NLP

Selected publications:

Term discovery

Selected publications:

2. Discovery of meaning relations between words/morphemes

The research on emergent linguistic and cognitive representations enables computers to deal with semantics: to process data having certain access to its meaning and eventually to its context of use.

Keywords: Self-organising semantic maps, SOM, Word ICA, Latent Semantic Analysis (LSA), random mapping (RM), word spaces, conceptual spaces

More on emergence of linguistic representations for words.

Word sense discovery and disambiguation

A specific task in NLP where the representation of word meaning is important is the twin probem of word sense discovery and word sense disambiguation. Discovery is the process of uncovering the possible different meanings for a given word, and disambiguation is the determination of which meaning is intended in a given instance or given context of the word.

Selected publications:

3. Modeling of sequential patterns of the elements

Discovery of constructions

Selected publications:

Statistical language modeling

Statistical language modeling is the endeavor for finding models that can accurately estimate the probabilities of natural language sequences or utterances. Language models are essential in many a NLP applications, such as speech recognition and machine translation. Our research concentrates on efficient ways of representing the relevant probabilities by applying unsupervised machine learning methods.

Selected publications:

Past research projects


Here is a list of courses organized by the people in the NLP area.


Former people

Internal NatLang pages

You are at: CIS → Natural Language Processing

Page maintained by krista at, last updated Friday, 06-Dec-2013 12:44:31 EET