T-61.184 Audio mining
Special course on Information Technology
IV (4 ov, L)
Lecturer: Prof. Mikko Kurimo
Assistant: DI Vesa Siivola
Semester: Autumn 2002
Credit points: 4 cr
Place: Seminar room Y405 in the main building of HUT
Time: Thursdays 14-16, starting from September 12
Language: English or Finnish
Seminar course description
Audio mining is a recently developed research area for
information retrieval and data exploration on audio input.
Today there is already a huge amount of audio information
that is digitally stored or transmitted
in speeches, videos, movies, radio and television programs,
music, recorded meetings and so on.
Actually a large proportion of human-generated information is speech and
much of it is in the form television and radio broadcasts.
In multimedia retrieval the audio information, especially from spoken
audio is in key position, because, when automatically
transcribed by a speech recognizer, it is closest to the
written representation where the traditional search engines
However, in addition to text, the audio transcriptions may include
other information, such as rival word candidates,
recognizer's confidence levels etc.
A severe difficulty still is that the IR system must tolerate
20 - 50 %, or even more, misrecognized words in the transcribed text.
The recent wide interdisciplinary interest in speech and multimedia
indexing emphasizes clearly that the indexing and retrieval of
audio segments based on its semantic content
is a significant and challenging research problem.
While it is still very difficult to index data segments based on
automatically extracted semantic content,
such as: "Here's Mrs President interviewed",
from moving or still images only,
it is now becoming possible using the speech recognition technology.
The current stage of development in speech processing,
especially large vocabulary continuous speech recognition,
high performance computing and accessing digital audio
for training the recognizers,
have brought up several interesting pilot systems
and more recently commercial applications, as well.
The idea of storing large amounts of audio material for retrieval
is not a new one, but the traditional, at least partly manual,
methods are so extremely inefficient that the use of retrieval
has sofar been very limited.
The course is based on related special issues on journals, such as
Additional material consists of selected journal articles
on relevant topics, such as
- Accessing information in spoken audio.
Special Issue of Speech Communication Vol. 32:1-2, September 2000.
- Automatic transcription of broadcast news data.
Special Issue of Speech Communication Vol. 37:1-2, May 2002.
- Separation of speech from music and non-speech sounds
- Speech segmentation by speaker and prosody information
- Speech indexing by sub-word units
- Speech recognition of broadcast news
- Optimizing speech recognition for information retrieval
- Evaluation methods for audio retrieval
- Indexing and retrieval of music and other non-speech
- Existing pilot systems for audio mining
The basic knowledge of pattern recognition, speech processing,
speech recognition, natural language processing,
and information retrieval methods are helpful.
A link list on audio mining.
Several short introductions and other related resources.
Browse a few starting from the beginning to get an idea
what is involved in the theme.
Requirements for passing the course
To pass the course (4cr), you have to
To pass with distinction, the seminar talk, the handouts, and the
project work (or study) must each be very good, and 95% of the exercises
should be solved.
- participate actively,
- give a seminar talk based on a chosen journal article
- generate one or two homework questions or exercises about your
assigned area for other people to solve,
- solve a set of exercises given during the seminar, and
- carry out a small project work or a litterature study
on the subject of your seminar talk
Signing up for the course
Preferably by WWWTopi
or by showing up at the first meeting on September 12.
If you cannot make it, send e-mail to Mikko.Kurimo@hut.fi.
Relationship to other studies
At TKK the course is suited for the Language Technology major
(Kieliteknologian pää/sivuaine) and for studies in Information
Also students and staff from the KIT
(Kieliteknologian opetuksen verkosto) are welcome -- please sign
up in advance.
Mikko.Kurimo@hut.fi (tel. 451 5388)
Monday, 02-Sep-2002 14:13:49 EEST