TAOCP 4.3.1 Exercise 35
Let a decuple-precision floating-point number be represented by two consecutive machine words (X_1,X_0).
Section 4.3.1: The Classical Algorithms
Exercise 35. [**] [40] Develop a set of computer subroutines for "decuple-precision floating point" arithmetic, using excess 0, base $b$, nine-place floating point number representation, where $b$ is the computer word size, and allowing a full word for the exponent. (Thus each floating point number is represented by two consecutive words, and all scaling is done by moving full words instead of by shifting within the words.)
Verified: yes
Solve time: 4m27s
Let a decuple-precision floating-point number be represented by two consecutive machine words
$$ (X_1,X_0). $$
The machine radix is $b$, where one machine word contains an integer in the range
$$ 0\le x<b. $$
The first word $X_1$ contains the signed exponent. The second word $X_0$ contains a normalized signed fraction.
The essential feature of the representation is that the fraction itself is interpreted as a radix-$b^{-1}$ floating quantity. Thus no internal bit shifting is ever required; scaling is performed only by moving whole words.
We now specify the representation and the arithmetic subroutines completely.
Representation
Write
$$ X = \sigma ,(0.x_1x_2\cdots x_9)_b, b^e, $$
where
$$ 0\le x_i<b,\qquad x_1\ne0, $$
and where $\sigma\in{+1,-1}$.
The exponent $e$ is stored in the first word $X_1$ as a signed integer in excess-$0$ notation.
The significand is stored in the second word $X_0$ as the nine-word vector
$$ (x_1,x_2,\ldots,x_9). $$
Hence the physical representation in memory is
$$ (e;\sigma,x_1,x_2,\ldots,x_9). $$
The sign is kept separately from the digits. Zero is represented by
$$ x_1=x_2=\cdots=x_9=0, $$
and then $e=0$ and $\sigma=+1$.
Normalization condition:
$$ x_1\ne0 $$
unless the number is zero.
Because each radix digit occupies an entire word, multiplication or division by $b$ is accomplished by moving words one position left or right.
Auxiliary subroutines
We first define several primitive operations.
NORM
Input:
$$ (\sigma,e,x_1,\ldots,x_9,x_{10},x_{11}), $$
where two guard digits $x_{10},x_{11}$ are present.
Output:
Normalized rounded number
$$ (\sigma,e',y_1,\ldots,y_9). $$
Algorithm:
- If all digits are zero, return zero.
- While $x_1=0$:
move
$$ x_i\leftarrow x_{i+1},\qquad 1\le i\le10, $$
set $x_{11}\leftarrow0$, and replace
$$ e\leftarrow e-1. $$ 3. While $x_1\ge b$:
move
$$ x_{i+1}\leftarrow x_i,\qquad 10\ge i\ge1, $$
replace
$$ x_1\leftarrow0, $$
and replace
$$ e\leftarrow e+1. $$ 4. Round to nearest.
If
$$ x_{10}>\frac b2, $$
or if
$$ x_{10}=\frac b2 $$
and either $x_{11}\ne0$ or $x_9$ is odd, add $1$ to $x_9$. 5. Propagate carries backward:
for $i=9,8,\ldots,1$,
if $x_i=b$, replace
$$ x_i\leftarrow0,\qquad x_{i-1}\leftarrow x_{i-1}+1. $$ 6. If $x_1=b$, perform one right shift and increase $e$ by $1$.
The returned digits are
$$ y_i=x_i,\qquad 1\le i\le9. $$
COMPARE
Input:
$$ (x_1,\ldots,x_9),\qquad(y_1,\ldots,y_9). $$
Output:
$$ -1,0,+1 $$
according as the first vector is less than, equal to, or greater than the second.
Method:
Lexicographic comparison from left to right.
SHIFT-RIGHT
Input:
$$ (x_1,\ldots,x_{11}),\qquad k\ge0. $$
Output:
$$ (0,\ldots,0,x_1,\ldots,x_{11-k}). $$
This operation is implemented only by word motion.
Addition subroutine ADD
Input:
$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$
$$ B=(\sigma_B,e_B,b_1,\ldots,b_9). $$
Output:
$$ C=A+B. $$
Algorithm:
Step A1. Exponent ordering
If
$$ e_A<e_B, $$
interchange $A$ and $B$.
Now define
$$ d=e_A-e_B\ge0. $$
Step A2. Alignment
Extend both fractions with guard digits:
$$ a_{10}=a_{11}=0, $$
$$ b_{10}=b_{11}=0. $$
Shift $B$ right by $d$ places.
If $d\ge11$, replace $B$ by zero.
No digit shifting within words occurs.
Step A3. Same signs
If
$$ \sigma_A=\sigma_B, $$
perform multiple-precision addition:
$$ c_i=a_i+b_i+\kappa_i, $$
with carries propagated right-to-left.
The result sign is
$$ \sigma_C=\sigma_A. $$
Proceed to normalization.
Step A4. Opposite signs
Use COMPARE to determine whether
$$ |A|\ge|B|. $$
Subtract the smaller fraction from the larger by Algorithm S.
The result sign is the sign of the larger operand.
Step A5. Normalize
Call NORM on the resulting digit vector.
Subtraction subroutine SUB
Compute
$$ A-B=A+(-B). $$
This is identical with ADD after reversing the sign of $B$.
Multiplication subroutine MUL
Input:
$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$
$$ B=(\sigma_B,e_B,b_1,\ldots,b_9). $$
Output:
$$ C=AB. $$
Algorithm:
Step M1. Sign and exponent
Set
$$ \sigma_C=\sigma_A\sigma_B, $$
$$ e_C=e_A+e_B. $$
Step M2. Integer product
Form the product
$$ p_k=\sum_{i+j=k} a_i b_j, $$
for
$$ 2\le k\le18. $$
Carry propagation is performed in radix $b$.
This is precisely Algorithm M applied to the vectors
$$ (a_1,\ldots,a_9), \qquad (b_1,\ldots,b_9). $$
The result occupies eighteen words:
$$ (p_1,\ldots,p_{18}). $$
Step M3. Renormalization
Since both operands satisfy
$$ b^{-1}\le (0.a_1\cdots a_9)_b<1, $$
their product satisfies
$$ b^{-2}\le |AB|<1. $$
Hence either $p_1\ne0$ or $p_2\ne0$.
If $p_1=0$, shift left one word and decrement the exponent.
Otherwise leave the product unchanged.
Step M4. Rounding
Retain
$$ p_1,\ldots,p_9, $$
use
$$ p_{10},p_{11} $$
as guard digits, and call NORM.
Division subroutine DIV
Input:
$$ A=(\sigma_A,e_A,a_1,\ldots,a_9), $$
$$ B=(\sigma_B,e_B,b_1,\ldots,b_9), $$
with $B\ne0$.
Output:
$$ C=A/B. $$
Algorithm:
Step D1. Sign and exponent
Set
$$ \sigma_C=\sigma_A\sigma_B, $$
$$ e_C=e_A-e_B. $$
Step D2. Quotient digits
Apply Algorithm D to the normalized digit vectors
$$ (a_1,\ldots,a_9), \qquad (b_1,\ldots,b_9). $$
Compute eleven quotient digits
$$ q_1,\ldots,q_{11}. $$
Step D3. Normalization
If $q_1=0$, shift left one word and decrement the exponent.
Step D4. Rounding
Call NORM on
$$ (q_1,\ldots,q_{11}). $$
Comparison subroutine FCOMP
To compare two floating-point numbers:
- Compare signs.
- If signs agree, compare exponents.
- If exponents agree, compare significands lexicographically.
Negative numbers reverse the ordering.
Overflow and underflow
Overflow occurs if the exponent exceeds the largest signed machine integer.
Underflow occurs if repeated normalization drives the exponent below the smallest signed machine integer.
The arithmetic routines return an overflow or underflow indicator together with the computed sign.
Remarks on the representation
The exercise's special representation eliminates all internal digit shifting. Every scaling operation is accomplished by moving complete words:
- exponent alignment in addition uses whole-word right shifts;
- normalization uses whole-word left or right shifts;
- multiplication and division naturally produce word-aligned digit vectors.
Thus the floating-point package is essentially a thin layer over the multiple-precision integer routines of Section 4.3.1.