TAOCP 4.2.1 Exercise 3

Section 4.2.1: Single-Precision Calculations

Exercise 3. [11] (K. Zuse, 1936.) Show that if we are using normalized floating binary arithmetic, there is a way to increase the precision slightly without loss of memory space: A $p$-bit fraction part can be represented using only $p - 1$ bit positions of a computer word, if the range of exponent values is decreased very slightly.

Verified: no
Solve time: 4m45s

In normalized binary floating-point arithmetic, every nonzero number is represented in the form

$$ x=\pm(0.1b_2b_3\cdots b_p)_2,2^e, $$

where the leading fraction bit is always $1$. The normalization condition forces the first digit after the binary point to be $1$, so the $p$-bit fraction has the form

$$ 1b_2b_3\cdots b_p. $$

Hence there are only

$$ 2^{p-1} $$

possible normalized fraction patterns, because the first bit is fixed and only the remaining $p-1$ bits vary.

Therefore it is unnecessary to store the leading bit explicitly. We may store only

$$ b_2b_3\cdots b_p, $$

using $p-1$ bit positions, and reconstruct the full fraction by supplying the omitted leading $1$ whenever the number is interpreted. This is the familiar "hidden-bit" or "implicit leading 1" technique.

Now suppose the original machine allocates $p$ bits for the fraction and $q$ bits for the exponent. The total storage is $p+q$ bits.

After suppressing the leading fraction bit, only $p-1$ fraction bits are needed. One bit position is thus freed. To keep the total word length unchanged, this extra bit may be added to the exponent field, giving $q+1$ exponent bits.

An exponent field of $q+1$ bits can represent twice as many exponent codes as a field of $q$ bits. We use one of these extra codes to indicate that the leading fraction bit is implicit. The remaining exponent codes then represent almost the same exponent range as before. Specifically, if the original exponent range contained $2^q$ values, the new representation uses one of the $2^{q+1}$ codes for the hidden-bit convention and the other $2^{q+1}-1$ codes for exponents. Since

$$ 2^{q+1}-1 > 2^q, $$

the exponent range can easily be chosen to differ from the original range by only a negligible amount, for example by sacrificing one extreme exponent value.

Thus a normalized $p$-bit binary fraction can be represented using only $p-1$ stored fraction bits. The missing leading $1$ is supplied implicitly, and the resulting reduction in the exponent range is at most slight. Consequently, the precision is effectively increased by one binary digit without increasing the total memory required.