TAOCP 3.5 Exercise 13

Section 3.5: What Is a Random Sequence?

Exercise 13. [HM27] Show that an $\infty$-distributed ${0 \ldots 1}$ sequence passes the "gap test" in the following sense: If $0 \le \alpha < \beta \le 1$ and $p = \beta - \alpha$, let $f(0) = 0$, and for $n \ge 1$ let $f(n)$ be the smallest integer $m > f(n-1)$ such that $\alpha \le U_m < \beta$; then

$$\Pr(f(n) - f(n-1) = k) = p(1-p)^{k-1}.$$

Verified: yes
Solve time: 2m44s

Let

$$ I=[\alpha,\beta), \qquad p=\beta-\alpha . $$

Since ${U_n}$ is $\infty$-distributed, every finite set of coordinates has the same joint distribution as independent uniform random variables on $[0,1)$. In particular, for any distinct indices $j_1,\ldots,j_t$,

$$ \Pr(U_{j_1}\in A_1,\ldots,U_{j_t}\in A_t) = \prod_{i=1}^t \lambda(A_i), $$

for measurable sets $A_i\subseteq[0,1)$, where $\lambda$ denotes Lebesgue measure. Hence the events

$$ E_j={U_j\in I} $$

are independent and satisfy

$$ \Pr(E_j)=\lambda(I)=p, \qquad \Pr(E_j^c)=1-p. $$

Define

$$ f(0)=0, \qquad f(n)=\min{m>f(n-1):U_m\in I}, $$

and let

$$ G_n=f(n)-f(n-1). $$

We must prove that

$$ \Pr(G_n=k)=p(1-p)^{k-1} \qquad (k\ge1). $$

Fix $n\ge1$ and $k\ge1$. Since $f(n-1)$ is a random index, we condition on its value. For every $r\ge1$,

$$ {f(n-1)=r,; G_n=k} = {f(n-1)=r,; U_{r+1}\notin I,\ldots,U_{r+k-1}\notin I,; U_{r+k}\in I}. $$

Therefore

$$ \Pr(G_n=k) = \sum_{r\ge1} \Pr!\bigl( f(n-1)=r,, U_{r+1}\notin I,\ldots,U_{r+k-1}\notin I,, U_{r+k}\in I \bigr). $$

Now the event ${f(n-1)=r}$ depends only on the coordinates

$U_1,\ldots,U_r$. The event

$$ B_r= {U_{r+1}\notin I,\ldots,U_{r+k-1}\notin I,, U_{r+k}\in I} $$

depends only on the coordinates

$U_{r+1},\ldots,U_{r+k}$.

These two collections of coordinates are disjoint. By the independence implied by $\infty$-distribution,

$$ \Pr({f(n-1)=r}\cap B_r) = \Pr(f(n-1)=r)\Pr(B_r). $$

Furthermore,

$$ \Pr(B_r) = (1-p)^{k-1}p, $$

because the $k$ coordinates involved are independent, the first $k-1$ must fall outside $I$, and the last must fall inside $I$.

Hence

$$ \Pr(G_n=k) = \sum_{r\ge1} \Pr(f(n-1)=r),(1-p)^{k-1}p. $$

Factoring out the constant term gives

$$ \Pr(G_n=k) = (1-p)^{k-1}p \sum_{r\ge1}\Pr(f(n-1)=r). $$

Since $f(n-1)$ is finite with probability $1$, the probabilities $\Pr(f(n-1)=r)$ sum to $1$. Therefore

$$ \Pr(G_n=k) = p(1-p)^{k-1}. $$

Thus, for every $n\ge1$ and $k\ge1$,

$$ \boxed{\Pr\bigl(f(n)-f(n-1)=k\bigr)=p(1-p)^{k-1}}. $$

This is exactly the geometric distribution required by the gap test. ∎