T-61.184 Audio mining
Special course on Information Technology IV (4 ov, L)

Lecturer: Prof. Mikko Kurimo
Assistant: DI Vesa Siivola
Semester: Autumn 2002
Credit points: 4 cr
Place: Seminar room Y405 in the main building of HUT
Time: Thursdays 14-16, starting from September 12
Language: English or Finnish
Homepage: http://www.cis.hut.fi/Opinnot/T-61.184/

Seminar course description

Audio mining is a recently developed research area for information retrieval and data exploration on audio input. Today there is already a huge amount of audio information that is digitally stored or transmitted in speeches, videos, movies, radio and television programs, music, recorded meetings and so on. Actually a large proportion of human-generated information is speech and much of it is in the form television and radio broadcasts. In multimedia retrieval the audio information, especially from spoken audio is in key position, because, when automatically transcribed by a speech recognizer, it is closest to the written representation where the traditional search engines can operate. However, in addition to text, the audio transcriptions may include other information, such as rival word candidates, recognizer's confidence levels etc. A severe difficulty still is that the IR system must tolerate 20 - 50 %, or even more, misrecognized words in the transcribed text.

The recent wide interdisciplinary interest in speech and multimedia indexing emphasizes clearly that the indexing and retrieval of audio segments based on its semantic content is a significant and challenging research problem.

While it is still very difficult to index data segments based on automatically extracted semantic content, such as: "Here's Mrs President interviewed", from moving or still images only, it is now becoming possible using the speech recognition technology. The current stage of development in speech processing, especially large vocabulary continuous speech recognition, high performance computing and accessing digital audio for training the recognizers, have brought up several interesting pilot systems and more recently commercial applications, as well. The idea of storing large amounts of audio material for retrieval is not a new one, but the traditional, at least partly manual, methods are so extremely inefficient that the use of retrieval has sofar been very limited.

The course is based on related special issues on journals, such as

  1. Accessing information in spoken audio. Special Issue of Speech Communication Vol. 32:1-2, September 2000.
  2. Automatic transcription of broadcast news data. Special Issue of Speech Communication Vol. 37:1-2, May 2002.
Additional material consists of selected journal articles on relevant topics, such as
  1. Separation of speech from music and non-speech sounds
  2. Speech segmentation by speaker and prosody information
  3. Speech indexing by sub-word units
  4. Speech recognition of broadcast news
  5. Optimizing speech recognition for information retrieval
  6. Evaluation methods for audio retrieval
  7. Indexing and retrieval of music and other non-speech
  8. Existing pilot systems for audio mining


The basic knowledge of pattern recognition, speech processing, speech recognition, natural language processing, and information retrieval methods are helpful.

A link list on audio mining. Several short introductions and other related resources. Browse a few starting from the beginning to get an idea what is involved in the theme.

Requirements for passing the course

To pass the course (4cr), you have to
  1. participate actively,
  2. give a seminar talk based on a chosen journal article
  3. generate one or two homework questions or exercises about your assigned area for other people to solve,
  4. solve a set of exercises given during the seminar, and
  5. carry out a small project work or a litterature study on the subject of your seminar talk
To pass with distinction, the seminar talk, the handouts, and the project work (or study) must each be very good, and 95% of the exercises should be solved.

Signing up for the course

Preferably by WWWTopi https://webtopi.hut.fi/ or by showing up at the first meeting on September 12. If you cannot make it, send e-mail to Mikko.Kurimo@hut.fi.

Relationship to other studies

At TKK the course is suited for the Language Technology major (Kieliteknologian pää/sivuaine) and for studies in Information Technology. Also students and staff from the KIT (Kieliteknologian opetuksen verkosto) are welcome -- please sign up in advance.

More information

Mikko.Kurimo@hut.fi (tel. 451 5388)

Monday, 02-Sep-2002 14:13:49 EEST