Exercise 35. [**] [40] Develop a set of computer subroutines for "decuple-precision floating point" arithmetic, using excess 0, base $b$, nine-place floating point number representation, where $b$ is the computer word size, and allowing a full word for the exponent. (Thus each floating point number is represented by two consecutive words, and all scaling is done by moving full words instead of by shifting within the words.)

Verified: yes
Solve time: 4m27s

Let a decuple-precision floating-point number be represented by two consecutive machine words

$$ (X_1,X_0). $$

The machine radix is $b$, where one machine word contains an integer in the range

$$ 0\le x<b. $$

The first word $X_1$ contains the signed exponent. The second word $X_0$ contains a normalized signed fraction.

The essential feature of the representation is that the fraction itself is interpreted as a radix-$b^{-1}$ floating quantity. Thus no internal bit shifting is ever required; scaling is performed only by moving whole words.

We now specify the representation and the arithmetic subroutines completely.

Representation

Write

$$ X = \sigma ,(0.x_1x_2\cdots x_9)_b, b^e, $$

where

$$ 0\le x_i<b,\qquad x_1\ne0, $$

and where $\sigma\in{+1,-1}$.

The exponent $e$ is stored in the first word $X_1$ as a signed integer in excess-$0$ notation.

The significand is stored in the second word $X_0$ as the nine-word vector

$$ (x_1,x_2,\ldots,x_9). $$

Hence the physical representation in memory is

$$ (e;\sigma,x_1,x_2,\ldots,x_9). $$

The sign is kept separately from the digits. Zero is represented by

$$ x_1=x_2=\cdots=x_9=0, $$

and then $e=0$ and $\sigma=+1$.

Normalization condition:

$$ x_1\ne0 $$

unless the number is zero.

Because each radix digit occupies an entire word, multiplication or division by $b$ is accomplished by moving words one position left or right.

Auxiliary subroutines

We first define several primitive operations.

NORM

Input:

$$ (\sigma,e,x_1,\ldots,x_9,x_{10},x_{11}), $$

where two guard digits $x_{10},x_{11}$ are present.

Output:

Normalized rounded number

$$ (\sigma,e',y_1,\ldots,y_9). $$

Algorithm:

If all digits are zero, return zero.
While $x_1=0$:

move

$$ x_i\leftarrow x_{i+1},\qquad 1\le i\le10, $$

set $x_{11}\leftarrow0$, and replace

$$ e\leftarrow e-1. $$ 3. While $x_1\ge b$:

move

$$ x_{i+1}\leftarrow x_i,\qquad 10\ge i\ge1, $$

replace

$$ x_1\leftarrow0, $$

and replace

$$ e\leftarrow e+1. $$ 4. Round to nearest.

$$ x_{10}>\frac b2, $$

or if

$$ x_{10}=\frac b2 $$

and either $x_{11}\ne0$ or $x_9$ is odd, add $1$ to $x_9$. 5. Propagate carries backward:

for $i=9,8,\ldots,1$,

if $x_i=b$, replace

$$ x_i\leftarrow0,\qquad x_{i-1}\leftarrow x_{i-1}+1. $$ 6. If $x_1=b$, perform one right shift and increase $e$ by $1$.

The returned digits are

$$ y_i=x_i,\qquad 1\le i\le9. $$

COMPARE

Input:

$$ (x_1,\ldots,x_9),\qquad(y_1,\ldots,y_9). $$

Output:

$$ -1,0,+1 $$

according as the first vector is less than, equal to, or greater than the second.

Method:

Lexicographic comparison from left to right.

SHIFT-RIGHT

Input:

$$ (x_1,\ldots,x_{11}),\qquad k\ge0. $$

Output:

$$ (0,\ldots,0,x_1,\ldots,x_{11-k}). $$

This operation is implemented only by word motion.

Addition subroutine ADD

Input:

$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$

$$ B=(\sigma_B,e_B,b_1,\ldots,b_9). $$

Output:

$$ C=A+B. $$

Algorithm:

Step A1. Exponent ordering

$$ e_A<e_B, $$

interchange $A$ and $B$.

Now define

$$ d=e_A-e_B\ge0. $$

Step A2. Alignment

Extend both fractions with guard digits:

$$ a_{10}=a_{11}=0, $$

$$ b_{10}=b_{11}=0. $$

Shift $B$ right by $d$ places.

If $d\ge11$, replace $B$ by zero.

No digit shifting within words occurs.

Step A3. Same signs

$$ \sigma_A=\sigma_B, $$

perform multiple-precision addition:

$$ c_i=a_i+b_i+\kappa_i, $$

with carries propagated right-to-left.

The result sign is

$$ \sigma_C=\sigma_A. $$

Proceed to normalization.

Step A4. Opposite signs

Use COMPARE to determine whether

$$ |A|\ge|B|. $$

Subtract the smaller fraction from the larger by Algorithm S.

The result sign is the sign of the larger operand.

Step A5. Normalize

Call NORM on the resulting digit vector.

Subtraction subroutine SUB

Compute

$$ A-B=A+(-B). $$

This is identical with ADD after reversing the sign of $B$.

Multiplication subroutine MUL

Input:

$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$

$$ B=(\sigma_B,e_B,b_1,\ldots,b_9). $$

Output:

$$ C=AB. $$

Algorithm:

Step M1. Sign and exponent

Set

$$ \sigma_C=\sigma_A\sigma_B, $$

$$ e_C=e_A+e_B. $$

Step M2. Integer product

Form the product

$$ p_k=\sum_{i+j=k} a_i b_j, $$

for

$$ 2\le k\le18. $$

Carry propagation is performed in radix $b$.

This is precisely Algorithm M applied to the vectors

$$ (a_1,\ldots,a_9), \qquad (b_1,\ldots,b_9). $$

The result occupies eighteen words:

$$ (p_1,\ldots,p_{18}). $$

Step M3. Renormalization

Since both operands satisfy

$$ b^{-1}\le (0.a_1\cdots a_9)_b<1, $$

their product satisfies

$$ b^{-2}\le |AB|<1. $$

Hence either $p_1\ne0$ or $p_2\ne0$.

If $p_1=0$, shift left one word and decrement the exponent.

Otherwise leave the product unchanged.

Step M4. Rounding

Retain

$$ p_1,\ldots,p_9, $$

use

$$ p_{10},p_{11} $$

as guard digits, and call NORM.

Division subroutine DIV

Input:

$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$

$$ B=(\sigma_B,e_B,b_1,\ldots,b_9), $$

with $B\ne0$.

Output:

$$ C=A/B. $$

Algorithm:

Step D1. Sign and exponent

Set

$$ \sigma_C=\sigma_A\sigma_B, $$

$$ e_C=e_A-e_B. $$

Step D2. Quotient digits

Apply Algorithm D to the normalized digit vectors

$$ (a_1,\ldots,a_9), \qquad (b_1,\ldots,b_9). $$

Compute eleven quotient digits

$$ q_1,\ldots,q_{11}. $$

Step D3. Normalization

If $q_1=0$, shift left one word and decrement the exponent.

Step D4. Rounding

Call NORM on

$$ (q_1,\ldots,q_{11}). $$

Comparison subroutine FCOMP

To compare two floating-point numbers:

Compare signs.
If signs agree, compare exponents.
If exponents agree, compare significands lexicographically.

Negative numbers reverse the ordering.

Overflow and underflow

Overflow occurs if the exponent exceeds the largest signed machine integer.

Underflow occurs if repeated normalization drives the exponent below the smallest signed machine integer.

The arithmetic routines return an overflow or underflow indicator together with the computed sign.

Remarks on the representation

The exercise's special representation eliminates all internal digit shifting. Every scaling operation is accomplished by moving complete words:

exponent alignment in addition uses whole-word right shifts;
normalization uses whole-word left or right shifts;
multiplication and division naturally produce word-aligned digit vectors.

Thus the floating-point package is essentially a thin layer over the multiple-precision integer routines of Section 4.3.1.

TAOCP 4.3.1 Exercise 35

Representation

Auxiliary subroutines

NORM

COMPARE

SHIFT-RIGHT

Addition subroutine ADD

Step A1. Exponent ordering

Step A2. Alignment

Step A3. Same signs

Step A4. Opposite signs

Step A5. Normalize

Subtraction subroutine SUB

Multiplication subroutine MUL

Step M1. Sign and exponent

Step M2. Integer product

Step M3. Renormalization

Step M4. Rounding

Division subroutine DIV

Step D1. Sign and exponent

Step D2. Quotient digits

Step D3. Normalization

Step D4. Rounding

Comparison subroutine FCOMP

Overflow and underflow

Remarks on the representation