TAOCP 4.2.2 Exercise 21

Section 4.2.2: Accuracy of Floating Point Arithmetic

Exercise 21. [M35] (T. J. Dekker.) Theorem C shows how to do exact addition of floating binary numbers. Explain how to do exact multiplication: Express the product $uv$ in the form $w + w'$, where $w$ and $w'$ are computed from two given floating binary numbers $u$ and $v$, using only the operations $\oplus$, $\ominus$, and $\otimes$.

Verified: no
Solve time: 9m50s

Exercise 4.2.2.21 [M35] (T. J. Dekker), Corrected Solution

We are asked to compute an exact decomposition of the product of two floating-point numbers $u$ and $v$ in the form

$$ uv = w + w', $$

using only the floating-point operations $\oplus$, $\ominus$, and $\otimes$. The number $w$ is the standard floating-point approximation of $uv$, and $w'$ is the exact residual, also representable as a floating-point number, so that the decomposition is exact in real arithmetic.

The method follows Dekker's TwoProduct algorithm, which combines splitting of operands with a carefully arranged computation of the residual.

Step 1: Split the operands

Let the floating-point numbers $u$ and $v$ have $p$-bit fractions. Define

$$ s = \lceil p/2 \rceil $$

and a splitting constant

$$ c = 2^s + 1. $$

For any floating-point number $x$, define its high and low parts by

$$ x_H = (c \otimes x) \ominus ((c \otimes x) \ominus x), \qquad x_L = x \ominus x_H. $$

Lemma (Dekker Splitting Lemma):

$x = x_H + x_L$ exactly.
The significant bits of $x_H$ and $x_L$ do not overlap: $x_H$ contains the leading $s$ bits of the fraction of $x$, and $x_L$ contains the remaining lower bits.
Both $x_H$ and $x_L$ are floating-point numbers.

Justification: Multiplying $x$ by $c = 2^s+1$ shifts the leading $s$ bits into a position where subtracting $(c\otimes x) \ominus x$ extracts the high part exactly. Subtracting $x_H$ from $x$ yields the low part exactly. No rounding occurs in these subtractions because the two parts do not overlap in significant bits.

Apply this to both operands:

$$ u = u_H + u_L, \qquad v = v_H + v_L $$

exactly.

Step 2: Compute the floating-point product

Compute

$$ w = u \otimes v. $$

This is the standard floating-point multiplication. The exact real product is

$$ uv = (u_H + u_L)(v_H + v_L) = u_H v_H + u_H v_L + u_L v_H + u_L v_L. $$

The goal is now to compute the residual $w' = uv - w$ exactly using floating-point operations.

Step 3: Compute the exact residual $w'$

Dekker observed that the residual can be computed as

$$ w' = ((u_H \otimes v_H \ominus w) \oplus (u_H \otimes v_L)) \oplus (u_L \otimes v_H) \oplus (u_L \otimes v_L). $$

This formula is carefully arranged so that each intermediate quantity is exactly representable in floating-point arithmetic and each addition uses $\oplus$ or $\ominus$ to capture the exact sum of two numbers that do not overlap in significant bits.

Step-by-step construction:

Compute the initial product: $w = u \otimes v$.
Compute the "high-high" product: $p_1 = u_H \otimes v_H$.
Compute the first residual: $r_1 = p_1 \ominus w$. This is exact because $p_1$ contains the leading bits of $uv$ and $w$ is the rounded product.
Compute the mixed products: $p_2 = u_H \otimes v_L$ and $p_3 = u_L \otimes v_H$.
Compute the low-low product: $p_4 = u_L \otimes v_L$.
Sum the contributions to the residual using $\oplus$ in non-overlapping order:

$$ w' = (((r_1 \oplus p_2) \oplus p_3) \oplus p_4). $$

Justification:

By the splitting lemma, the summands do not overlap in significant bits, so each $\oplus$ produces an exact sum.
The subtraction $p_1 \ominus w$ isolates the rounding error from the floating-point product $w$.
Consequently, $w'$ equals exactly $uv - w$ in real arithmetic.

Step 4: Summary of the algorithm (TwoProduct)

Given floating-point numbers $u$ and $v$:

Compute $c = 2^{\lceil p/2 \rceil} + 1$.
Split $u$ and $v$ into high and low parts:

$$ u_H = (c \otimes u) \ominus ((c \otimes u) \ominus u), \quad u_L = u \ominus u_H, $$

$$ v_H = (c \otimes v) \ominus ((c \otimes v) \ominus v), \quad v_L = v \ominus v_H. $$

Compute the floating-point product: $w = u \otimes v$.
Compute the residual:

$$ w' = (((u_H \otimes v_H \ominus w) \oplus (u_H \otimes v_L)) \oplus (u_L \otimes v_H)) \oplus (u_L \otimes v_L). $$

Then

$$ uv = w + w' $$

exactly in real arithmetic.

Step 5: Verification of correctness

Exact decomposition: By the splitting lemma, $u = u_H + u_L$ and $v = v_H + v_L$ exactly, with no overlap in bits.
Floating-point product: $w = u \otimes v$ is the standard floating-point approximation.
Residual computation: The arrangement of $w'$ ensures that each addition or subtraction is exact in floating-point arithmetic. No rounded sum i