TAOCP 4.2.3 Exercise 3
Let u=(u_m+eu_l)b^{e_u}, \qquad v=(v_m+ev_l)b^{e_v}, where $u_m$ and $v_m$ are the most-significant halves of the fractions, $u_l$ and $v_l$ are the least-significant halves, and
Section 4.2.3: Double-Precision Calculations
Exercise 3. [M20] Explain why overflow cannot occur during Program M.
Verified: no
Solve time: 1m14s
Solution
Let
$$ u=(u_m+eu_l)b^{e_u}, \qquad v=(v_m+ev_l)b^{e_v}, $$
where $u_m$ and $v_m$ are the most-significant halves of the fractions, $u_l$ and $v_l$ are the least-significant halves, and
$$ e=b^{-5}. $$
Program M forms the product
$$ uv=(u_m+eu_l)(v_m+ev_l)b^{e_u+e_v} $$
by computing only those partial products that contribute to the leftmost eight fraction bytes. According to Fig. 4, the retained part of the product is obtained from
$$ u_m v_m,\qquad e,u_m v_l,\qquad e,u_l v_m, $$
while the term
$$ e^2u_l v_l $$
is discarded.
Each normalized half-fraction satisfies
$$ 0\le u_m<1,\qquad 0\le u_l<1, $$
and similarly for $v_m$ and $v_l$. Hence every partial product computed by Program M is strictly less than $1$.
The largest contribution comes from $u_m v_m$. Since both factors are less than $1$,
$$ u_m v_m<1. $$
The remaining retained contributions are multiplied by the factor $e=b^{-5}$, therefore
$$ e,u_m v_l<e, \qquad e,u_l v_m<e. $$
Consequently the quantity accumulated by the program before normalization is bounded by
$$ u_m v_m+e,u_m v_l+e,u_l v_m <1+e+e =1+2e. $$
Because $b\ge2$,
$$ e=b^{-5}\le\frac1{32}, $$
so
$$ 1+2e \le 1+\frac1{16} <2. $$
Thus the unnormalized fraction produced by the multiplication is always strictly less than $2$.
Overflow during the arithmetic operations of Program M would require a quantity of magnitude at least $2$ to arise in the accumulator. The bound above shows that this never happens. At worst, the product lies in the interval $[1,2)$ and requires a single right shift during normalization.
The exponent is adjusted only after the fraction has been formed and normalized. Exponent overflow is tested separately by the program; the exercise asks why overflow cannot occur during the multiplication itself. The reason is that the retained fraction is always less than $2$, so every intermediate partial sum fits within the available fraction field.
This completes the proof.
∎