## Tik-61.181 Special Course on IT I, autumn 98

The maximum amount of points is 30. Your total amount of points will be converted to a grade as follows:
```Points:    Grade:
0-15       0
16-18      1
19-21      2
22-24      3
25-27      4
28-30      5
```
You must get at least grade 1 to pass the course.

Problems:

1.1
max 3p
A discrete memoryless source emits a sequence of statistically independent binary digits with probabilities p(1) = 0.005 and p(0) = 0.995. The digits are taken 100 at a time and a binary codeword is provided for every sequence of 100 digits containing three of fewer ones.
 (a) Assuming that all codewords are the same length, find the minimum length required to provide codewords for all sequences with three or fewer ones. (b) Calculate the probability of observing a source sequence for which no codeword has been assigned. (c) Use Chebyshev's inequality to bound the probability of observing a source sequence for which no codeword has been assigned. Compare this bound with the actual probability computed in part (b).
1.2
max 2p
Show that the entropy of the probability distribution, (p1, ..., pi, ..., pj, ..., pm), is less than the entropy of the distribution (p1, ..., (pi+pj)/2, ..., (pi+pj)/2, ..., pm). Show that in general any transfer of probability that makes the distribution more uniform increases the entropy.
2.1
max 2p
Determine a Huffman code for the text of this problem.
2.2
max 2p
Consider all possible 256×256 pixel images with 8 bit grayscale pixels. How many images can be compressed by 20 bits or more using a uniquely decodable code? Compare the result with the number of all possible images.
2.3
max 2p
Assume there is a source emitting symbols a and b with probability P(a | ß)=ß. Given a sequence s with Na symbols a and Nb symbols b, what is the probability P(a | s) if p(ß) = [ß(1-ß)]-1/2 / pi.
3.1
max 2p
The noisy channel coding theorem states that for a discrete memoryless channel, it is possible to achieve a finite rate of transmission with arbitrarily small error probability. Discuss the reasons why in real applications higher error probabilities are unavoidable.
3.2
max 2p
Which distributions p(x) maximize the differential entropy of x for each of the following constraints:
• E(|x|)=1
• Absolute value |x| is at most one
• E(x^2)=1
Also calculate the maximum value of the entropy.
4.1
max 2p
Recompute the posterior probability (10.33) in the book by using the given conjugate priors for the mean and variance.
4.2
max 2p
With Laplace's method, what is the approximation for the posterior probability density if the posterior has n local maxima at x1, x2, ..., xn, the corresponding joint densities of xi and the data D are p(x1, D), ..., p(xn, D) and the corresponding Hessian matrices of -log p(xi, D) are H1, ..., Hn.
5.1
max 2p
Estimate in bits the total sensory experience that you have had in your life - visual information, auditory information etc. Estimate how much information you have memorized. Compare these with the capacity of your brain assuming you have 1011 neurons each making 1000 synaptic connections and that the capacity result for one neuron (two bits per connection) applies. Is your brain full yet?
6.1
max 2p
In the MML principle the coding cost is measured by assuming a random discretisation of the parameters wi with accuracies ei. The coding cost of the accuracies is neglected. Show that the MML principle can be seen as a special case of ensemble learning and that the bits-back argument also justifies the neglection of the coding cost of the accuracies. (Hint: consider what kind of ensemble distribution for the weights is implicitly used in the MML principle.)
7.1
max 1p
Exercise 20.1 in the book, page 387.
7.2
max 2p
Exercise 21.3 in the book, page 403.
8.1
max 2p
Exercise 19.1 in the book, page 379.
8.2
max 2p
Exercise 13.3 in the book, page 276.

That's all folks. Deadline 15th February.

Tästä sivusta vastaa Harri.Lappalainen@hut.fi.
Sivua on viimeksi päivitetty 11.11.1998.
URL: http://www.cis.hut.fi/cis/edu/Tik-61.181/exercises.html