TAOCP 3.4.2 Exercise 6

Let $T$ denote the value of $t$ when Algorithm S terminates.

Section 3.4.2: Random Sampling and Shuffling

Exercise 6. [M24] What is the standard deviation of the value computed in exercise 5?

Verified: no
Solve time: 4m31s


Solution

Let $T$ denote the value of $t$ when Algorithm S terminates. As in Exercise 5, $T$ is the largest element of a uniformly chosen $n$-subset of ${1,2,\dots,N}$. Its distribution is

$$ \Pr{T=t} = \frac{\binom{t-1}{n-1}}{\binom{N}{n}}, \qquad n \le t \le N. $$

We aim to compute the standard deviation $\sigma_T = \sqrt{\operatorname{Var}(T)}$.

Step 1: First moment

Using the identity

$$ t \binom{t-1}{n-1} = n \binom{t}{n}, $$

we obtain

$$ E(T) = \frac{1}{\binom{N}{n}} \sum_{t=n}^{N} t \binom{t-1}{n-1} = \frac{n}{\binom{N}{n}} \sum_{t=n}^{N} \binom{t}{n}. $$

By the standard hockey-stick identity,

$$ \sum_{t=n}^{N} \binom{t}{n} = \binom{N+1}{n+1}. $$

Therefore

$$ E(T) = \frac{n}{\binom{N}{n}} \binom{N+1}{n+1} = \frac{n(N+1)}{n+1}. $$

This agrees with Exercise 5.

Step 2: Second moment

Write

$$ t^2 = t(t-1) + t. $$

Then

$$ t^2 \binom{t-1}{n-1} = t(t-1) \binom{t-1}{n-1} + t \binom{t-1}{n-1}. $$

We know

$$ t \binom{t-1}{n-1} = n \binom{t}{n}, \qquad t(t-1)\binom{t-1}{n-1} = n(n-1)\binom{t}{n}. $$

Hence

$$ t^2 \binom{t-1}{n-1} = n(n-1) \binom{t}{n} + n \binom{t}{n} = n(n-1+1) \binom{t}{n} = n^2 \binom{t}{n}. $$

We must be careful: we need to check coefficients. A more precise derivation is as follows.

  • Start with $t(t-1) \binom{t-1}{n-1}$. Using the identity

$$ t(t-1) \binom{t-1}{n-1} = n(n-1) \binom{t}{n}. $$

  • Then $t^2 \binom{t-1}{n-1} = t(t-1) \binom{t-1}{n-1} + t \binom{t-1}{n-1} = n(n-1) \binom{t}{n} + n \binom{t}{n} = n(n) \binom{t}{n} = n^2 \binom{t}{n}.$

Thus the computation is correct.

Then the second moment is

$$ E(T^2) = \frac{1}{\binom{N}{n}} \sum_{t=n}^{N} t^2 \binom{t-1}{n-1} = \frac{n^2}{\binom{N}{n}} \sum_{t=n}^{N} \binom{t}{n}. $$

Using the hockey-stick identity:

$$ \sum_{t=n}^{N} \binom{t}{n} = \binom{N+1}{n+1}. $$

Hence

$$ E(T^2) = \frac{n^2}{\binom{N}{n}} \binom{N+1}{n+1} = n^2 \frac{(N+1)!/( (n+1)!(N-n)! )}{N!/ (n!(N-n)! )} = n^2 \frac{(N+1)! n!}{(n+1)! N!} = n \frac{N+1}{n+1} \cdot n = \frac{n(N+1)n}{n+1} = \frac{n^2 (N+1)}{n+1}. $$

Step 3: Correct approach using the standard combinatorial formula

The previous approach miscounts the second moment. The classical correct formula uses the identity:

$$ \sum_{t=n}^{N} t^2 \binom{t-1}{n-1} = n \sum_{t=n}^{N} t \binom{t}{n}. $$

Then, applying the known identity

$$ \sum_{t=n}^{N} t \binom{t}{n} = (n+1) \binom{N+2}{n+2} - \binom{N+1}{n+1}, $$

we obtain

$$ E(T^2) = \frac{n}{\binom{N}{n}} \left[ (n+1) \binom{N+2}{n+2} - \binom{N+1}{n+1} \right]. $$

Simplify the binomial ratios:

$$ \frac{\binom{N+2}{n+2}}{\binom{N}{n}} = \frac{(N+2)(N+1)}{(n+2)(n+1)}, \qquad \frac{\binom{N+1}{n+1}}{\binom{N}{n}} = \frac{N+1}{n+1}. $$

Therefore

$$ E(T^2) = n \left[ (n+1) \frac{(N+2)(N+1)}{(n+2)(n+1)} - \frac{N+1}{n+1} \right] = n \left[ \frac{(N+2)(N+1)}{n+2} - \frac{N+1}{n+1} \right]. $$

Combine the terms:

$$ E(T^2) = n(N+1) \left( \frac{N+2}{n+2} - \frac{1}{n+1} \right) = n(N+1) \frac{(N+2)(n+1) - (n+2)}{(n+1)(n+2)} = n(N+1) \frac{nN + n}{(n+1)(n+2)} = \frac{n N (N+1)}{n+2}. $$

Check: careful. Multiply numerator:

$$ (N+2)(n+1) - (n+2) = N n + 2 n + N + 2 - n - 2 = n N + n + N = nN + N + n. $$

Yes, factor:

$$ nN + N + n = (N+1) n + N. $$

Then

$$ E(T^2) = n (N+1) \frac{(N+1) n + N}{(n+1)(n+2)}. $$

This matches the corrected formula.

Step 4: Variance

We have

$$ \operatorname{Var}(T) = E(T^2) - (E(T))^2 = n(N+1) \frac{(N+1)n + N}{(n+1)(n+2)} - \left( \frac{n(N+1)}{n+1} \right)^2. $$

Simplify:

$$ \operatorname{Var}(T) = \frac{n(N+1)((N+1)n + N)}{(n+1)(n+2)} - \frac{n^2 (N+1)^2}{(n+1)^2} = \frac{n(N+1)}{(n+1)^2(n+2)} \left[ ((N+1)n + N)(n+1) - n (N+1)(n+2) \right]. $$

Compute the bracket:

$$ ((N+1)n + N)(n+1) = (N+1)n(n+1) + N(n+1) = n(n+1)(N+1) + N(n+1) $$

$$ n(N+1)(n+2) = n(n+2)(N+1) $$

Subtract:

$$ [n(n+1)(N+1) + N(n+1)] - n(n+2)(N+1) = n(n+1)(N+1) + N(n+1) - n(n+2)(N+1) = n(n+1)(N+1) - n(n+2)(N+1) + N(n+1) = - n(N+1) + N(n+1) = N (n+1) - n(N+1) = N + N n - n N - n = N - $$