TAOCP 4.3.1 Exercise 3

Exercise 2 generalizes Algorithm A to the addition of $m$ nonnegative $n$-place numbers (u^{(1)}_{n-1}\cdots u^{(1)}_0)_b,\; (u^{(2)}_{n-1}\cdots u^{(2)}_0)_b,\; \ldots,\; (u^{(m)}_{n-1}\cdots u^{(m)}...

Section 4.3.1: The Classical Algorithms

Exercise 3. [21] Write a MIX program for the algorithm of exercise 2, and estimate its running time as a function of $m$ and $n$.

Verified: no
Solve time: 7m25s


Solution

Exercise 2 generalizes Algorithm A to the addition of $m$ nonnegative $n$-place numbers

$$ (u^{(1)}_{n-1}\cdots u^{(1)}_0)b,; (u^{(2)}{n-1}\cdots u^{(2)}_0)b,; \ldots,; (u^{(m)}{n-1}\cdots u^{(m)}_0)_b, $$

where $m\le b$. At stage $j$, the algorithm forms

$$ t=k+\sum_{r=1}^{m}u^{(r)}_j, $$

then sets

$$ w_j=t\bmod b,\qquad k=\lfloor t/b\rfloor . $$

After the last column, $w_n=k$.

Storage conventions

Let

$$ U_r+j $$

contain the digit $u^{(r)}_j$, for $1\le r\le m$ and

$0\le j\le n-1$.

The result digits are stored in

$$ W+j,\qquad 0\le j\le n. $$

The values $m$, $n$, and $b$ are stored in locations

$M$, $N$, and $B$.

The carry $k$ is kept in location $K$.

The current column sum $t$ is accumulated in location $T$.

The index register $I_1$ contains the current digit position $j$.

The index register $I_2$ contains the current summand number $r$.

A table $BASE$ contains the addresses $U_1,U_2,\ldots,U_m$. Thus

$BASE+r$ contains the starting address of the $r$th summand.

MIX program

        ENT1    0           j ← 0
        ENTA    0
        STA     K           k ← 0

COL     LDA     K           t ← k
        STA     T

        ENT2    0           r ← 0

SUM     INC2    1           r ← r+1
        CMP2    M
        JG      DONE

        LDA     BASE,2      A ← address(Ur)
        STA     P

        LDA     T
        ADD     P,1         add u(r)j
        STA     T

        JMP     SUM

DONE    LDA     T

        ENTX    0
        DIV     B           quotient = ⌊t/b⌋,
                            remainder = t mod b

        STA     K           new carry

        STX     W,1         store wj

        INC1    1           j ← j+1
        CMP1    N
        JL      COL

        LDA     K
        STA     W,1         store wn

        HLT

The essential point is the use of MIX division. Before DIV B, the

quantity $t$ is placed in register $A$ and register $X$ is set to

zero. The operation divides the double-length quantity $(A:X)$ by

$b$. Since $t<b^2$ when $m\le b$,

$$ t=(\lfloor t/b\rfloor)b+(t\bmod b), $$

and after the division the quotient $\lfloor t/b\rfloor$ is in $A$,

while the remainder $t\bmod b$ is in $X$. Hence the carry and output

digit are obtained exactly as required by Algorithm A.

Correctness

At the beginning of each execution of label COL, let $j$ be the value

contained in $I_1$. The invariant is:

$$ K= \left\lfloor \frac{ k_{j-1}+\sum_{i=0}^{j-1} \left(\sum_{r=1}^{m}u^{(r)}_i\right)b^i } {b^j} \right\rfloor , $$

and the locations $W,W+1,\ldots,W+j-1$ contain the already computed

digits $w_0,\ldots,w_{j-1}$.

The loop SUM forms

$$ T=K+\sum_{r=1}^{m}u^{(r)}_j=t. $$

The division step computes

$$ A=\left\lfloor \frac{t}{b}\right\rfloor, \qquad X=t\bmod b. $$

Therefore

$$ K\leftarrow \left\lfloor \frac{t}{b}\right\rfloor, \qquad W+j\leftarrow t\bmod b, $$

which is exactly the rule of Exercise 2. Thus the invariant is preserved

for the next column. After the final column $j=n-1$ has been processed,

the carry remaining in $K$ is $w_n$, which is stored and the program

halts. Hence the program correctly computes the $(n+1)$-digit sum.

Running time

For each column:

  1. Initialize $T$ from the carry: $2$ instructions.
  2. Initialize $I_2$: $1$ instruction.
  3. For each of the $m$ summands:
  • increment $I_2$,
  • compare with $M$,
  • load base address,
  • store it,
  • load $T$,
  • add the digit,
  • store $T$,
  • jump back.

This is $8m+O(1)$ instructions. 4. Perform division and store carry and digit: $5$ instructions. 5. Advance $j$ and test termination: $3$ instructions.

Hence the instruction count is

$$ (8m+11)n+O(1). $$

Therefore the running time grows linearly in both $m$ and $n$:

$$ \boxed{\Theta(mn)}. $$

This is the expected complexity, since every one of the $mn$ input

digits must be examined exactly once.