TAOCP 3.4.2 Exercise 4

Let $p(m,t)$ denote the probability that exactly $m$ items have been selected from the first $t$ items processed by Algorithm S.

Section 3.4.2: Random Sampling and Shuffling

Exercise 4. [M23] Let $p(m, t)$ be the probability that exactly $m$ items are selected from among the first $t$ in the selection sampling technique. Show directly from Algorithm S that

$$p(m,t) = \binom{t}{m}\binom{N-t}{n-m} \Big/ \binom{N}{n}, \qquad \text{for } 0 \le t \le N.$$

Verified: yes
Solve time: 4m33s


Solution

Let $p(m,t)$ denote the probability that exactly $m$ items have been selected from the first $t$ items processed by Algorithm S.

We shall prove by induction on $t$ that

$$ p(m,t)=\frac{\binom{t}{m}\binom{N-t}{,n-m,}}{\binom{N}{n}}, \qquad 0\le t\le N. $$

Basis

When $t=0$, no items have been examined. Hence

$$ p(0,0)=1,\qquad p(m,0)=0 \ \ (m>0). $$

The proposed formula gives

$$ \frac{\binom00\binom Nn}{\binom Nn}=1 $$

for $m=0$, and

$$ \frac{\binom0m\binom N{n-m}}{\binom Nn}=0 $$

for $m>0$. Thus the formula is correct when $t=0$.

Inductive step

Assume the formula holds for a fixed $t$, and consider item $t+1$.

To have exactly $m$ selected items among the first $t+1$ items, one of two mutually exclusive events must occur.

  1. Exactly $m$ items were selected among the first $t$, and item $t+1$ is not selected.
  2. Exactly $m-1$ items were selected among the first $t$, and item $t+1$ is selected.

If $m$ items have already been selected among the first $t$, Algorithm S selects the next item with probability

$$ \frac{n-m}{N-t}. $$

Hence the probability that item $t+1$ is not selected is

$$ 1-\frac{n-m}{N-t} = \frac{N-t-n+m}{N-t}. $$

Similarly, if $m-1$ items have already been selected, the probability that item $t+1$ is selected is

$$ \frac{n-(m-1)}{N-t} = \frac{n-m+1}{N-t}. $$

Therefore

$$ p(m,t+1) = p(m,t)\frac{N-t-n+m}{N-t} + p(m-1,t)\frac{n-m+1}{N-t}. $$

Substituting the induction hypothesis,

$$ \begin{aligned} p(m,t+1) &= \frac1{\binom Nn} \Biggl[ \binom tm\binom{N-t}{n-m} \frac{N-t-n+m}{N-t} \ &\qquad\qquad + \binom t{m-1}\binom{N-t}{n-m+1} \frac{n-m+1}{N-t} \Biggr]. \end{aligned} $$

We now simplify the two terms.

For the first term, let $a=N-t$ and $b=n-m$. Then

$$ \binom ab\frac{a-b}{a} = \frac{a!}{b!(a-b)!}\frac{a-b}{a} = \frac{(a-1)!}{b!(a-b-1)!} = \binom{a-1}{b}. $$

Hence

$$ \binom{N-t}{n-m} \frac{N-t-n+m}{N-t} = \binom{N-t-1}{n-m}. $$

For the second term, using

$$ \binom ab\frac{b}{a} = \frac{a!}{b!(a-b)!}\frac{b}{a} = \frac{(a-1)!}{(b-1)!(a-b)!} = \binom{a-1}{b-1}, $$

with $a=N-t$ and $b=n-m+1$, we obtain

$$ \binom{N-t}{n-m+1} \frac{n-m+1}{N-t} = \binom{N-t-1}{n-m}. $$

Therefore

$$ \begin{aligned} p(m,t+1) &= \frac1{\binom Nn} \Bigl[ \binom tm\binom{N-t-1}{n-m} + \binom t{m-1}\binom{N-t-1}{n-m} \Bigr] \ &= \frac{\binom{N-t-1}{n-m}}{\binom Nn} \Bigl[ \binom tm+\binom t{m-1} \Bigr]. \end{aligned} $$

Using Pascal's identity,

$$ \binom tm+\binom t{m-1} = \binom{t+1}{m}, $$

we get

$$ p(m,t+1) = \frac{\binom{t+1}{m}\binom{N-t-1}{n-m}} {\binom Nn}. $$

Since $N-(t+1)=N-t-1$, this is exactly

$$ p(m,t+1) = \frac{\binom{t+1}{m}\binom{N-(t+1)}{,n-m,}} {\binom Nn}. $$

Thus the formula holds for $t+1$.

By induction,

$$ \boxed{ p(m,t) = \frac{\binom{t}{m}\binom{N-t}{,n-m,}} {\binom{N}{n}} } $$

for all $0\le t\le N$. ∎