TAOCP 4.2.2 Exercise 17

Section 4.2.2: Accuracy of Floating Point Arithmetic

Exercise 17. [28] Write a MIX subroutine, FCMP, that compares the floating point number $u$ in location ACC with the floating point number $v$ in register A, setting the comparison indicator to LESS, EQUAL, or GREATER according as $u < v$, $u = v$, or $u > v$ ${e}$; here $e$ is stored in location EPSILON as a nonnegative fixed point quantity with the radix point assumed at the left of the word. Assume normalized inputs.

Verified: yes
Solve time: 5m56s

Solution

Let the floating point number in location ACC be

$$ u=s_u,m_u,b^{e_u}, $$

and let the floating point number in register A be

$$ v=s_v,m_v,b^{e_v}, $$

where the inputs are normalized. Let $e$ be the nonnegative fixed-point quantity stored in EPSILON.

The notation ${e}$ means that $u$ and $v$ are to be regarded as equal whenever their relative difference is at most $e$. Thus FCMP should set the comparison indicator according to

$$ u<v-e|v|,\qquad |u-v|\le e|v|,\qquad u>v+e|v|. $$

Equivalently, if

$$ d=u-v, \qquad t=e|v|, $$

then

$$ d<-t \Rightarrow \text{LESS}, \qquad |d|\le t \Rightarrow \text{EQUAL}, \qquad d>t \Rightarrow \text{GREATER}. $$

The problem is therefore reduced to computing $d$ and $t$.

Computation of $d=u-v$

The critical step is exponent alignment.

Suppose first that $e_u\ge e_v$. Then

$$ u-v = b^{e_u} \left( s_u m_u - s_v m_v b^{e_v-e_u} \right). $$

Since $e_v-e_u\le0$, the factor $b^{e_v-e_u}$ is obtained by dividing the mantissa of $v$ by $b^{e_u-e_v}$.

Similarly, if $e_v>e_u$,

$$ u-v = b^{e_v} \left( s_u m_u b^{e_u-e_v} - s_v m_v \right), $$

and the mantissa of $u$ must be divided by $b^{e_v-e_u}$.

Thus exponent alignment is performed by repeatedly shifting the mantissa belonging to the smaller exponent one radix position to the right until the exponents agree. The earlier description $m_v'=m_v b^{e_v-e_u}$ is correct only when interpreted as division by $b^{e_u-e_v}$; the actual operation is a right shift.

After alignment, the signed mantissas are subtracted and the result normalized. This yields the floating point number

$$ d=u-v. $$

Computation of $t=e|v|$

The quantity stored in EPSILON is a fixed-point fraction with radix point at the left of the word. Let

$$ e=\frac{E}{b^{5}}, $$

where $E$ is the integer represented by the five bytes of EPSILON.

Let

$$ |v|=m_v b^{e_v}, $$

with $m_v>0$.

To form

$$ t=e|v|, $$

the mantissa of $|v|$ is multiplied by the fixed-point fraction $e$. Since both operands are represented as ordinary MIX words, this is done by integer multiplication of the mantissa field by $E$, followed by normalization and adjustment of the exponent. No floating-point multiply instruction is required. The result is exactly the floating-point quantity

$$ t=e|v|. $$

Computation of $|d|$

After $d=u-v$ has been formed, its sign byte is inspected.

If $d\ge0$, leave it unchanged.
If $d<0$, change the sign byte from minus to plus.

Since the magnitude field is unchanged, the resulting value is $|d|$.

This explicitly supplies the absolute-value operation needed for the equality test.

Comparison

The comparison is now straightforward.

Compare $d$ with $-t$.

$$ d<-t, $$

set the comparison indicator to LESS and return. 2. Compare $|d|$ with $t$.

$$ |d|\le t, $$

set the comparison indicator to EQUAL and return. 3. Otherwise,

$$ d>t, $$

so set the comparison indicator to GREATER and return.

The equivalence of these tests with the relative-error criterion follows immediately from the definition

$$ t=e|v|. $$

Indeed,

$$ |d|\le t \iff |u-v|\le e|v|, $$

which is exactly the condition for equality within tolerance ${e}$.

MIXAL sketch

A complete implementation follows the structure

FCMP     STJ   RETADR

* Form d = u - v.
         LDA   ACC
         STA   UBUF
         STA   DBUF

         * Align exponents.
         * Repeatedly right-shift the mantissa
         * belonging to the smaller exponent
         * until exponents agree.

         * Subtract aligned mantissas.
         * Normalize result.
         * Store d in DBUF.

* Form t = e|v|.
         LDA   A
         JAN   ABSV
CONT1    STA   VABS
         LDA   EPSILON
         MUL   VABS
         * Normalize product.
         STA   TBUF

* Test d < -t.
         CMPA  ...
         JL    LESS

* Test |d| <= t.
         LDA   DBUF
         JAN   ABSD
CONT2    CMPA  TBUF
         JLE   EQUAL

GREATER  CMPA  =0=
         JG    RETURN

LESS     * set comparison indicator LESS
         JMP   RETURN

EQUAL    * set comparison indicator EQUAL

RETURN   JMP   RETADR

ABSV     ENTA  |A|
         JMP   CONT1

ABSD     ENTA  |A|
         JMP   CONT2

The essential points are:

Exponent alignment is done by right-shifting the mantissa associated with the smaller exponent.
$|d|$ is obtained by replacing a negative sign byte by a positive sign byte.
$t=e|v|$ is formed by multiplying the mantissa of $|v|$ by the fixed-point fraction stored in EPSILON and normalizing.
The comparison indicator is set according to

$$ d<-t,\qquad |d|\le t,\qquad d>t. $$

Hence FCMP correctly implements comparison of normalized floating point numbers with relative tolerance ${e}$. ∎