Courses in previous years:
[ 1998 |
T-61.183 Special Course in Information Science III P V
T-61.183 Informaatiotekniikan erikoiskurssi III L V
|Lecturers:|| Prof. Timo Honkela, Dr.Tech.
Dr.Tech. Jorma Laaksonen, Dr.Tech. Krista Lagus, PhD Kai
|Place:||Lecture hall T4 in the
computer science building|
|Time:||Mondays at 14-16 o'clock
Multimodal Systems - Course description
Moving beyond the mainstream of just recognizing images, text, and
speech to automatic understanding of their content makes a lot of
new applications possible.
"By bringing in data from multiple modalities and contexts that
provide the various required internal conceptual dimensions,
one may be able to extract automatically the correct perceptual information.
Multimodal interfaces can be developed
by fusing several of perceptual and user feedback modalities.
During the seminar, we will consider the following topics:
- information retrieval from multimodal data
- video content analysis
- multimodal content description standard MPEG7
- analysis of multimodal data including:
speech, images, video, eye movements and gestures
- grounding word meanings in multimodal contexts
- multimodal data segmentation
- multimodal speech recognition
- multimodal person recognition / identification
- multimodal interfaces
Potential material to be covered during the course includes:
- Yian Li and C.-C. Jay Kuo (2003). Video Content Analysis
Using Multimodal Information. Kluwer.
- Lynne Duckley (2003). Multimedia Databases. Addison-Wesley.
- Chen Yu, Dana H. Ballard and Richard N. Aslin (2003), The Role of
Embodied Intention in Early Lexical Acquisition. Proceedings of the
Twenty-Fifth Annual Meeting of Cognitive Science Society. Boston, MA.
- Deb Roy (2004).
Grounding Language in the World: Schema Theory Meets Semiotics.
- Deb Roy (2003). Grounded Spoken Language Acquisition: Experiments
in Word Learning. IEEE Transactions on Multimedia.
- Proceedings of the IEEE
Vol. 91, Issue 9, Sept. 2003.
Special issue on human-computer multimodal interface.
- Interacting with computers by voice: automatic speech recognition and synthesis
- Recent advances in the automatic recognition of audiovisual speech
- Speech-gesture driven multimodal interfaces for crisis management
- Boosted learning in dynamic Bayesian networks for multimodal speaker detection
- Toward an affect-sensitive multimodal human-computer interaction
- Perceptive animated interfaces: first steps toward a new paradigm for
To pass the course, you will need to:
- participate sufficently in the seminar meetings,
- give a talk,
- solve a sufficient percentage of problems, and
- perform given empirical and/or experimental assignment(s).
- 24 Jan 05:
- Introduction to multimodal systems. How human beings
process multimodal information. (Honkela)
- Introducing the participants. Practical arrangements.
- 31 Jan 05:
- Different aspects of multimodal systems
- Assigning papers.
- 7 Feb 05: no common session (potentially good time for the groups to meet!)
- 14 Feb 05:
- Group 3, Jaakko Väyrynen and Tiina Lindh-Knuutila: "Grounded
Spoken Language Acquisition" (based on an article by Deb Roy)
and Quan Zhon: "Grounding Language in the World: Signs, Schemas, and
Meaning" (based on an article by Deb Roy)
- 21 Feb 05:
- Group 2 (Pöllä et al.): "Automatic Annotation of Images"
- 28 Feb 05:
- Group 1 (Pylkkönen et al.): "Video Analysis"
- 14 Mar 05:
- Group 3 (Lindh-Knuutila et Väyrynen): "Affect-sensitive
multimodal human-computer interaction" (slides)
- 21 Mar 05:
- Group 1 (Kivinen et al.): "MPEG-7 multimedia content description standard"
- giving the homework problems
- 4 Apr 05:
- Group 2 (Yang et al.): topic to be agreed
- 11 Apr 05:
- returning homework problems
- giving homework prototype answers
- 18 Apr 05:
- giving homework prototype answers