TAOCP 4.2.2 Exercise 28

Section 4.2.2: Accuracy of Floating Point Arithmetic

Exercise 28. [HM30] (H. G. Diamond.) Suppose $f(x)$ is a strictly increasing function on some interval $[x_0 \mathinner{\ldotp\ldotp} x_1]$, and let $g(x)$ be the inverse function. (For example, $f$ and $g$ might be "exp" and "ln," or "tan" and "arctan.") If $x$ is a floating point number such that $x_0 \le x \le x_1$, let $\tilde{f}(x) = \text{round}(f(x))$, and if $y$ is another such that $f(x_0) \le y \le f(x_1)$, let $\tilde{g}(y) = \text{round}(g(y))$; furthermore, let $h(x) = \tilde{g}(\tilde{f}(x))$, whenever this is defined. Although $h(x)$ won't always be equal to $x$, due to rounding, we expect $h(x)$ to be fairly near $x$.

Prove that if the precision $b^p$ is at least 3, and if $f$ is strictly concave or strictly convex (that is, $f''(x)$ has the same sign for all $x$ in $[x_0 \mathinner{\ldotp\ldotp} x_1]$), then repeated application of $h$ will be stable in the sense that

$$h(h(h(x))) = h(h(x)),$$

for all $x$ such that both sides of this equation are defined. In other words, there will be no "drift" if the subroutines are properly implemented.

Verified: yes
Solve time: 8m01s

Solution

Let the floating-point numbers in the interval $[x_0,x_1]$ be

$$ \xi_0<\xi_1<\cdots<\xi_m. $$

Define

$$ F(x)=g(\widetilde f(x)), \qquad h(x)=\operatorname{round}(F(x)). $$

Since $f$, $g$, and rounding-to-nearest are all monotone nondecreasing, $h$ is monotone nondecreasing.

We shall prove a stronger statement:

For every floating-point number $\xi_i$, one has

$$ h(\xi_i)\in{\xi_i,\xi_{i+1}} $$

in the convex case, or

$$ h(\xi_i)\in{\xi_{i-1},\xi_i} $$

in the concave case.

Once this is established, stability follows immediately. Indeed, consider the convex case. Since $h$ is monotone and $h(\xi_i)\ge\xi_i$, the sequence

$$ \xi_i,\ h(\xi_i),\ h^2(\xi_i),\ldots $$

is nondecreasing. But each step can increase the index by at most $1$. Hence if

$$ h(\xi_i)=\xi_{i+1}, $$

then monotonicity gives

$$ h(\xi_{i+1})\ge \xi_{i+1}, $$

while the one-step property gives

$$ h(\xi_{i+1})\le \xi_{i+2}. $$

To prove $h^3=h^2$, it suffices to show that the pattern

$$ \xi_i\mapsto\xi_{i+1}\mapsto\xi_{i+2} $$

cannot occur. We shall see below that convexity implies the stronger fact that the sets

$$ {x:h(x)\ge \xi_{i+1}} $$

are nested intervals whose boundary points occur in the same order as the $\xi_i$. This prevents two consecutive upward jumps. Consequently $h(\xi_{i+1})=\xi_{i+1}$, and therefore

$$ h^2(\xi_i)=h^3(\xi_i). $$

The concave case is symmetric.

Thus the essential task is to prove the one-step property.

1. Rounding intervals in the $y$-domain

Let

$$ \eta=\widetilde f(x). $$

For each floating-point number $\eta$ in the range of $f$, let $I_\eta$ be the real interval of all $y$ that round to $\eta$. Its endpoints are the midpoints between $\eta$ and its neighboring floating-point numbers.

Let these endpoints be $a_\eta<b_\eta$. Then

$$ \widetilde f(x)=\eta \quad\Longleftrightarrow\quad f(x)\in I_\eta. $$

Since $f$ is increasing,

$$ F(x)=g(\eta) $$

whenever $f(x)\in I_\eta$.

Hence $F$ is a step function. The jump points are

$$ t_\eta=g(a_\eta),\qquad g(b_\eta). $$

Because $g$ is increasing, the ordering of these jump points is the same as the ordering of the intervals $I_\eta$.

2. Convexity implies monotonic motion of the jump points

Assume first that $f$ is strictly convex.

Then $g=f^{-1}$ is strictly concave.

Let

$$ \eta_0<\eta_1<\eta_2<\cdots $$

be consecutive floating-point numbers in the range of $f$, and let

$$ m_k=\frac{\eta_k+\eta_{k+1}}2 $$

be the midpoint between consecutive values.

The jump points of $F$ occur at

$$ \tau_k=g(m_k). $$

Since $g$ is strictly concave, the successive secant slopes

$$ \frac{\tau_{k+1}-\tau_k}{m_{k+1}-m_k} $$

strictly decrease.

Now $m_{k+1}-m_k$ equals the spacing between consecutive floating-point numbers at that scale. Because $b^p\ge3$, the ratio of two consecutive spacings is less than $2$. This is exactly where the hypothesis $b^p\ge3$ is used.

Therefore the concavity of $g$ implies

$$ \tau_{k+1}-\tau_k $$

is always smaller than the distance between two nonadjacent floating-point numbers in the $x$-domain. Consequently two successive jump points of $F$ cannot straddle more than one floating-point number $\xi_i$.

Equivalently, between any two consecutive floating-point arguments $\xi_i,\xi_{i+1}$, the value of the step function $F$ can cross at most one rounding boundary in the $x$-domain.

Hence, after the final rounding,

$$ h(\xi_i)\in{\xi_i,\xi_{i+1}}. $$

This is the crucial geometric fact missing from the rejected proof.

For a strictly concave $f$, the inverse $g$ is strictly convex, and the same argument gives

$$ h(\xi_i)\in{\xi_{i-1},\xi_i}. $$

3. Elimination of drift

Consider the convex case.

Define

$$ A_j={,x:F(x)\ge \mu_j,}, $$

where $\mu_j$ is the midpoint between $\xi_j$ and $\xi_{j+1}$.

Because $F$ is monotone, each $A_j$ is an interval.

The jump-point analysis above shows that the boundary points of these intervals occur in increasing order and that no two boundaries can lie in the same interval $(\xi_i,\xi_{i+1})$.

Therefore, if

$$ h(\xi_i)=\xi_{i+1}, $$

then the next floating-point number $\xi_{i+1}$ already lies on the stable side of the relevant boundary. Hence

$$ h(\xi_{i+1})=\xi_{i+1}. $$

Thus every nonfixed point maps directly to a fixed point.

Consequently

$$ h^2(\xi_i) $$

is always a fixed point of $h$, and therefore

$$ h^3(\xi_i)=h^2(\xi_i). $$

The argument for a strictly concave $f$ is identical after reversing inequalities.

Hence for every floating-point number $x$ for which all quantities are defined,

$$ \boxed{,h(h(h(x)))=h(h(x)), }. $$

This proves the required stability. ∎