TAOCP 4.2.2 Exercise 20

Equation (17) asserts that for every real number $x$ in the range $b^{e-1}\le |x|<b^e,$ the rounded value satisfies $\operatorname{round}(x)=x(1+\delta(x)),$ with $\delta(x)$ defined by

Section 4.2.2: Accuracy of Floating Point Arithmetic

Exercise 20. [25] (S. Linnainmaa.) Find all $u$ and $v$ for which $|u| \ge |v|$ and (17) fails.

Verified: yes
Solve time: 1m50s


Solution

Equation (17) asserts that for every real number $x$ in the range

$b^{e-1}\le |x|<b^e,$

the rounded value satisfies

$\operatorname{round}(x)=x(1+\delta(x)),$

with $\delta(x)$ defined by

$$$$

This representation fails precisely when $\delta(x)$ is undefined. Since the denominator is $x$, the only possible failure occurs when $x=0$.

Exercise 20 asks for all floating point numbers $u$ and $v$ with $|u|\ge |v|$ for which (17) fails. In the present context, the relevant quantity is the exact result of a floating point operation before rounding. Since $|u|\ge |v|$, the only operation considered in this section that can produce a zero argument for (17) is subtraction:

$$$$

Equation (17) fails exactly when

$$$$

because then the expression

$$$$

is undefined.

The condition $u-v=0$ is equivalent to

$$$$

Since the hypothesis already requires $|u|\ge |v|$, every pair with $u=v$ satisfies the stated condition, and no pair with $u\ne v$ can make (17) fail because then $u-v\ne0$ and $\delta(u-v)$ is well defined.

Therefore the complete set of solutions is

$$$$

For such pairs, $u-v=0$ and the relative-error representation (17) is undefined; for all pairs with $u\ne v$, equation (17) holds.

This completes the proof.