Courses in previous years: [ 1998 | 2000 | 2002 | 2003 | 2004 ]

T-61.183 Special Course in Information Science III P V
T-61.183 Informaatiotekniikan erikoiskurssi III L V

Lecturers: Prof. Timo Honkela, Dr.Tech. Mikko Kurimo, Dr.Tech. Jorma Laaksonen, Dr.Tech. Krista Lagus, PhD Kai Puolamäki
Semester:Spring 2005
Credit points:3-4
Place:Lecture hall T4 in the computer science building
Time:Mondays at 14-16 o'clock (24.1.2005-2.5.2005)

Multimodal Systems - Course description

Moving beyond the mainstream of just recognizing images, text, and speech to automatic understanding of their content makes a lot of new applications possible. "By bringing in data from multiple modalities and contexts that provide the various required internal conceptual dimensions, one may be able to extract automatically the correct perceptual information. Multimodal interfaces can be developed by fusing several of perceptual and user feedback modalities.

During the seminar, we will consider the following topics:

  • information retrieval from multimodal data
  • video content analysis
  • multimodal content description standard MPEG7
  • analysis of multimodal data including: speech, images, video, eye movements and gestures
  • grounding word meanings in multimodal contexts
  • multimodal data segmentation
  • multimodal speech recognition
  • multimodal person recognition / identification
  • multimodal interfaces

Potential material to be covered during the course includes:

  • Yian Li and C.-C. Jay Kuo (2003). Video Content Analysis Using Multimodal Information. Kluwer.
  • Lynne Duckley (2003). Multimedia Databases. Addison-Wesley.
  • Chen Yu, Dana H. Ballard and Richard N. Aslin (2003), The Role of Embodied Intention in Early Lexical Acquisition. Proceedings of the Twenty-Fifth Annual Meeting of Cognitive Science Society. Boston, MA.
  • Deb Roy (2004). Grounding Language in the World: Schema Theory Meets Semiotics.
  • Deb Roy (2003). Grounded Spoken Language Acquisition: Experiments in Word Learning. IEEE Transactions on Multimedia.
  • Proceedings of the IEEE Vol. 91, Issue 9, Sept. 2003. Special issue on human-computer multimodal interface.
    • Interacting with computers by voice: automatic speech recognition and synthesis
    • Recent advances in the automatic recognition of audiovisual speech
    • Speech-gesture driven multimodal interfaces for crisis management
    • Boosted learning in dynamic Bayesian networks for multimodal speaker detection
    • Toward an affect-sensitive multimodal human-computer interaction
    • Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction

To pass the course, you will need to:

  • participate sufficently in the seminar meetings,
  • give a talk,
  • solve a sufficient percentage of problems, and
  • perform given empirical and/or experimental assignment(s).

Emerging timetable

  • 24 Jan 05:
    • Introduction to multimodal systems. How human beings process multimodal information. (Honkela)
    • Introducing the participants. Practical arrangements.
  • 31 Jan 05:
  • 7 Feb 05: no common session (potentially good time for the groups to meet!)
  • 14 Feb 05:
    • Group 3, Jaakko Väyrynen and Tiina Lindh-Knuutila: "Grounded Spoken Language Acquisition" (based on an article by Deb Roy) and Quan Zhon: "Grounding Language in the World: Signs, Schemas, and Meaning" (based on an article by Deb Roy)
  • 21 Feb 05:
    • Group 2 (Pöllä et al.): "Automatic Annotation of Images"
  • 28 Feb 05:
    • Group 1 (Pylkkönen et al.): "Video Analysis"
  • 14 Mar 05:
    • Group 3 (Lindh-Knuutila et Väyrynen): "Affect-sensitive multimodal human-computer interaction" (slides)
  • 21 Mar 05:
  • 4 Apr 05:
    • Group 2 (Yang et al.): topic to be agreed
  • 11 Apr 05:
    • returning homework problems
    • giving homework prototype answers
  • 18 Apr 05:
    • giving homework prototype answers