T-61.271 Information visualization
Exercise 7. Wed 24.11.2004:10-12 T3
- Focus and context
- Visualizing text
- 1.
- Fisheye view.
a. Assume that a data structure (e.g. contents of a book) that can
be represented by a tree of height
and brancing factor
. How
many steps does it take to traverse from one leaf node to another
by traversing though fisheye-views of the tree?
b. Present a (short) program code of you choice by setting the
focus on some line and using the fisheye view.
- 2.
- The size of vocabulary can be reduced by removing some rare and
very common words ("and", "the", ...). What could you do to reduce
the vocabulary further?
- 3.
- How are the distances and the resulting document presentations
in 2D space affected if the document vectors are not normalized
(not normalized means that
is generally not equal to
,
)?
- 4.
- Random projection can be used to reduce the dimension of the
document vectors:
, Under what condition
does the expectation of the similarity between two
document vectors (
) remain unchanged (i.e. how
should you generate entries for the matrix
)? How do you expect
the variance of the similarity
behave as a
function of the final dimensionality
of the document vector?
[Hints:
http://www.hut.fi/Yksikot/Kirjasto/Diss/2000/isbn9512252600/,
article 4,
http://www.cis.hut.fi/sami/abstracts.html#ijcnn98]
- 5.
- Dimension reduction methods, such as the use of feature words,
random projection or latent semantic indexing can be useful if you
use PCA or SOM to present the documents in 2D space. However, these
dimension reduction methods are useless if you use MDS. Why? Would
you expect to be able to represent the one million documents of
WEBSOM by using MDS instead of SOM?
Tapani Raiko
2004-11-22