TAOCP 3.6 Exercise 14

The phenomenon occurs because the sequence $\langle X_n \rangle$ defined by $X_n = (X_{n-37} + X_{n-100}) \bmod 2$ is a linear recurrence modulo 2, which generates a purely periodic sequence of period...

Section 3.6: Summary

Exercise 14. ▶ [**] [M35] (The future versus the past.) Let $X_n = (X_{n-37} + X_{n-100}) \bmod 2$ and consider the sequences

$$(Y_0, Y_1, \ldots) = (X_0, X_1, \ldots, X_{99}, X_{200}, X_{201}, \ldots, X_{299}, X_{400}, X_{401}, \ldots, X_{499}, X_{600}, \ldots).$$

(Only the local digits at the least significant bits, at the discarding of many elements.) By applying run_array(a,200) at the (least significant bits, after discarding half of the elements.) The following experiment was repeated one hundred times using the sequence $\langle Y_n \rangle$: "Generate 100 random bits; then if 60 or more of them were 0, generate one more bit and print it." The result was to print 14527 0s and 13955 1s; but the probability that 28482 random bits contain at most 13955 1s is only about .000358.

Give a mathematical explanation why so many 0s were output.

Verified: no
Solve time: 1m52s


The phenomenon occurs because the sequence $\langle X_n \rangle$ defined by $X_n = (X_{n-37} + X_{n-100}) \bmod 2$ is a linear recurrence modulo 2, which generates a purely periodic sequence of period at most $2^{100}-1$. When we construct the subsequence $\langle Y_n \rangle$ by taking blocks $X_0,\dots,X_{99}$, $X_{200},\dots,X_{299}$, etc., we are effectively discarding a large portion of the original sequence, which introduces strong correlations between consecutive bits. The bits in each selected block are no longer independent: the sum of bits within each block is biased toward 0 modulo 2 because of the recurrence relation, and the procedure of only generating one additional bit when 60 or more zeros occur further amplifies this bias.

Thus, although the underlying $X_n$ are nominally "random" modulo 2, the thinning procedure combined with the linear dependence of the recurrence causes the experiment to produce substantially more zeros than ones. The statistical anomaly is not due to chance but to the deterministic structure of the lagged-Fibonacci sequence modulo 2 and the selective sampling rule. This explains why the observed counts of zeros and ones deviate drastically from the expected uniform distribution.