TAOCP 4.3.2 Exercise 7

Let M_i=m_1m_2\cdots m_i , and recall that in Knuth's derivation of (24) the constants $c_{ij}$ ($i<j$) satisfy

Section 4.3.2: Modular Arithmetic

Exercise 7. ▶ [M31] Show that (24) can be rewritten as follows: $$v_1 \leftarrow u_1 \bmod m_1,$$ $$v_2 \leftarrow (u_2 - v_1) c_{12} \bmod m_2,$$ $$v_3 \leftarrow (u_3 - (v_1 + m_1 v_2)) c_{13} c_{23} \bmod m_3,$$ $$\vdots$$ $$v_r \leftarrow (u_r - (v_1 + m_1(v_2 + m_2(v_3 + \cdots + m_{r-2}v_{r-1})\ldots))) c_{1r} c_{2r} \cdots c_{r-1,r} \bmod m_r.$$ If the formulas are rewritten in this way, we see that for $r = 1$ constants $C_j = c_{1j} \cdots c_{j-1,j} \bmod m_j$ are needed instead of $r(r-1)/2$ constants $c_{ij}$ as in (24). Discuss the relative merits of this version of the formula as compared to (24), from the standpoint of computer calculation.

Verified: yes
Solve time: 2m32s


Let

$$ M_i=m_1m_2\cdots m_i , $$

and recall that in Knuth's derivation of (24) the constants $c_{ij}$ ($i<j$) satisfy

$$ c_{ij}\equiv m_i^{-1}\pmod{m_j}. $$

Formula (24) computes the mixed-radix digits $v_1,\ldots,v_r$ by successive corrections:

$$ v_1\leftarrow u_1, $$

and for $j>1$,

$$ v_j\leftarrow \Bigl( \cdots \bigl( (u_j-v_1)c_{1j}-v_2 \bigr)c_{2j} -\cdots-v_{j-1} \Bigr)c_{j-1,j} \pmod{m_j}. \tag{24} $$

We shall show that this is equivalent to the stated formulas.

Define

$$ A_1=u_j-v_1, $$

and recursively

$$ A_i=(A_{i-1}c_{i-1,j}-v_i), \qquad 2\le i\le j-1. $$

Then (24) is

$$ v_j\equiv A_{j-1}c_{j-1,j}\pmod{m_j}. $$

Since $c_{ij}\equiv m_i^{-1}\pmod{m_j}$, multiplication by $m_i$ gives

$$ A_{i-1}\equiv m_i(A_i+v_i)\pmod{m_j}. $$

Starting with $i=j-1$ and substituting repeatedly,

$$ \begin{aligned} A_{j-2} &\equiv m_{j-1}(A_{j-1}+v_{j-1}),\ A_{j-3} &\equiv m_{j-2}\bigl(m_{j-1}(A_{j-1}+v_{j-1})+v_{j-2}\bigr),\ &\ \vdots \end{aligned} $$

Hence

$$ A_1 \equiv v_2+m_2\bigl(v_3+m_3(\cdots+m_{j-2}(v_{j-1}+A_{j-1})\cdots)\bigr) \pmod{m_j}. $$

But $A_1=u_j-v_1$, so

$$ u_j- \Bigl( v_1+m_1\bigl(v_2+m_2(v_3+\cdots+m_{j-2}v_{j-1}\cdots)\bigr) \Bigr) \equiv M_{j-1}A_{j-1} \pmod{m_j}. $$

Multiplying by $M_{j-1}^{-1}\pmod{m_j}$ yields

$$ A_{j-1} \equiv \Bigl( u_j- (v_1+m_1(v_2+m_2(\cdots+m_{j-2}v_{j-1}\cdots))) \Bigr) M_{j-1}^{-1} \pmod{m_j}. $$

Therefore

$$ v_j \equiv \Bigl( u_j- (v_1+m_1(v_2+m_2(\cdots+m_{j-2}v_{j-1}\cdots))) \Bigr) M_{j-1}^{-1} \pmod{m_j}. $$

Now

$$ M_{j-1}^{-1} \equiv (m_1m_2\cdots m_{j-1})^{-1} \equiv m_1^{-1}m_2^{-1}\cdots m_{j-1}^{-1} \pmod{m_j}, $$

hence

$$ M_{j-1}^{-1} \equiv c_{1j}c_{2j}\cdots c_{j-1,j} \pmod{m_j}. $$

Thus

$$ v_j\leftarrow \Bigl( u_j- (v_1+m_1(v_2+m_2(\cdots+m_{j-2}v_{j-1}\cdots))) \Bigr) c_{1j}c_{2j}\cdots c_{j-1,j} \pmod{m_j}, $$

which is exactly the form stated in the exercise. Writing

$$ C_j\equiv c_{1j}c_{2j}\cdots c_{j-1,j}\pmod{m_j}, $$

the computation requires only the constants $C_j$, one for each modulus.

Comparison with (24)

The rewritten form has the advantage that only $r$ precomputed constants,

$$ C_1,\ldots,C_r, $$

need be stored, whereas (24) requires all

$$ \frac{r(r-1)}2 $$

constants $c_{ij}$. This is a substantial saving in storage when $r$ is large.

However, the computational behavior is different. In (24), the quantity being updated is always reduced modulo $m_j$. The successive corrections

$$ (u_j-v_1)c_{1j},\quad ((u_j-v_1)c_{1j}-v_2)c_{2j}, \quad\ldots $$

remain bounded by $m_j$, so only arithmetic on numbers of size comparable to $m_j$ is required.

In the rewritten version one must first form the mixed-radix prefix

$$ v_1+m_1(v_2+m_2(v_3+\cdots)), $$

or maintain it from previous steps. This quantity grows roughly like

$$ m_1m_2\cdots m_{j-1}, $$

and can therefore become much larger than $m_j$. The method saves storage, but it tends to require arithmetic on larger intermediate numbers.

Consequently:

  • The rewritten form is preferable when storage of the $\frac{r(r-1)}2$ constants is expensive or inconvenient.
  • Formula (24) is usually preferable when arithmetic cost is dominant, because it keeps all intermediate calculations reduced modulo $m_j$ and avoids constructing large mixed-radix prefixes.

Thus the two formulations trade storage for arithmetic efficiency.