Tik-61.183 Special course on information technology III

Randomized algorithms
Spring 2000
Heikki Mannila

Exercises 2, due February 16, 2000

Assume we have a file containing N records, where N is so large that we cannot keep all of the file in memory at one time. We wish to select a random subset of n records from the file. We will use the following algorithm:
Set t:=0, m:=0.
Repeat until m=n:
    Generate a random number U, uniformly distributed between 0 and 1.
    If U<(n-m)/(N-t), then
        Add the next record from the file into the subset.
        Set m:=m+1.
        Skip the next record.
    Set t:=t+1.  
Prove that the algorithm doesn't try to read more than N records from the file. Prove that each of the  $\dbinom{N}{n}$ subsets is equally likely to be selected.
Let p(x,y) be the probability that exactly x records are selected from among the first y by the previous algorithm. Show that

\begin{displaymath}p(x,y) = \binom{y}{x}\binom{N-y}{n-x}\bigg/\binom{N}{n}. \end{displaymath}

What is the expected value of t when the algorithm terminates? What is the variance?

Devise an algorithm to select a random permutation of n objects using O(n) random numbers. Each permutation should be equally likely.

Wednesday, 09-Feb-2000 20:43:29 EET