TAOCP 4.3.2 Exercise 14

We are asked to compute the representation of $w = (uv) \bmod (2^q - 1)$ for $q$-bit integers $u$ and $v$ that are expressed in a nonuniform radix representation $u = \sum_{k=0}^{n-1} u_k 2^{\lfloor k...

Section 4.3.2: Modular Arithmetic

Exercise 14. ▶ [M50] (Mersenne multiplication.) The cyclic convolution of $(x_0, x_1, \ldots, x_{n-1})$ and $(y_0, y_1, \ldots, y_{n-1})$ is defined to be $(z_0, z_1, \ldots, z_{n-1})$, where

$$z_k = \sum_{s+j \equiv k \pmod{n}} x_s y_j, \qquad \text{for } 0 \le k < n.$$

We will study efficient algorithms for cyclic convolution in Sections 4.3.3 and 4.6.4.

Consider $q$-bit integers $u$ and $v$ that are represented in the form

$$u = \sum_{k=0}^{n-1} u_k 2^{\lfloor kq/n \rfloor}, \qquad v = \sum_{k=0}^{n-1} v_k 2^{\lfloor kq/n \rfloor},$$

where $0 \le u_k, v_k < 2^{\lceil (k+1)q/n \rceil - \lfloor kq/n \rfloor}$. (This representation is a mixture of radices $2^{\lfloor q/n \rfloor}$ and $2^{\lceil q/n \rceil}$.) Suggest a good way to find the representation of

$$w = (uv) \bmod (2^q - 1),$$

using an appropriate cyclic convolution. [Hint: Do not be afraid of floating point arithmetic.]

Verified: yes
Solve time: 1m59s


Setup

We are asked to compute the representation of

$w = (uv) \bmod (2^q - 1)$

for $q$-bit integers $u$ and $v$ that are expressed in a nonuniform radix representation

$u = \sum_{k=0}^{n-1} u_k 2^{\lfloor kq/n \rfloor}, \qquad v = \sum_{k=0}^{n-1} v_k 2^{\lfloor kq/n \rfloor},$

where

$0 \le u_k, v_k < 2^{\lceil (k+1)q/n \rceil - \lfloor kq/n \rfloor}.$

The goal is to determine the components of $w$ in the same representation using an efficient method. The hint suggests employing a cyclic convolution to exploit the identity

$2^q \equiv 1 \pmod{2^q - 1}.$

Let us denote $w_k$ to be the $k$-th digit in the same radix system as $u_k$ and $v_k$, so that

$w = \sum_{k=0}^{n-1} w_k 2^{\lfloor kq/n \rfloor}.$

The cyclic convolution is defined as

$z_k = \sum_{s+j \equiv k \pmod{n}} x_s y_j.$

We seek a procedure to compute $w_k$ using this framework.

Solution

Observe first that modulo $2^q - 1$ we have

$2^q \equiv 1, \qquad 2^{q+r} \equiv 2^r \pmod{2^q - 1}.$

Consequently, multiplication modulo $2^q - 1$ is periodic with period $q$ in the exponent. Define

$\alpha_k = \lfloor kq/n \rfloor, \quad \beta_k = \lceil (k+1)q/n \rceil - \lfloor kq/n \rfloor$

so that $u_k, v_k < 2^{\beta_k}$. Then

$u v = \sum_{s=0}^{n-1} \sum_{j=0}^{n-1} u_s v_j 2^{\alpha_s + \alpha_j}.$

Reducing modulo $2^q - 1$, each term $2^{\alpha_s + \alpha_j}$ becomes

$2^{\alpha_s + \alpha_j} \equiv 2^{(\alpha_s + \alpha_j) \bmod q}.$

Define $\gamma_k = \lfloor kq/n \rfloor$ for $0 \le k < n$, and observe that for each $k$ there is a set of pairs $(s,j)$ such that

$\gamma_k \le (\alpha_s + \alpha_j) \bmod q < \gamma_{k+1}.$

Hence, the $k$-th component $w_k$ is given by

$w_k = \sum_{\substack{s+j \equiv k \pmod{n}}} u_s v_j,$

where the sum is taken over indices modulo $n$ corresponding to the cyclic convolution. This equality holds because the reduction modulo $2^q - 1$ wraps all exponents modulo $q$, which coincides with the cyclic convolution indices due to the uniform spacing of the $\alpha_k$.

Thus, computing $w$ modulo $2^q - 1$ reduces to computing the cyclic convolution of $(u_0, \ldots, u_{n-1})$ and $(v_0, \ldots, v_{n-1})$:

$w_k = \sum_{s+j \equiv k \pmod{n}} u_s v_j.$

Finally, if the $u_k$ and $v_k$ are sufficiently small, floating-point arithmetic can be used to compute this cyclic convolution efficiently, for example using the Fast Fourier Transform. After obtaining the $z_k$ from the convolution, each $w_k$ can be reduced modulo $2^{\beta_k}$ to fit the original radix bounds. If carries propagate, they can be handled sequentially from lower to higher $k$, since $u_k v_j$ contributes only to $w_{s+j \bmod n}$.

Verification

We check the key steps:

  1. Modulo $2^q - 1$, $2^q \equiv 1$, so all exponents can be reduced modulo $q$.
  2. The mapping $k \mapsto (\alpha_s + \alpha_j) \bmod q$ is equivalent to $s+j \pmod n$ in the digit indices, since $\alpha_{s+n} \ge \alpha_s + q$ and wraps modulo $q$.
  3. The cyclic convolution formula sums exactly the contributions of $u_s v_j$ whose exponent modulo $q$ falls into the $k$-th digit, ensuring correctness of the representation.
  4. Reduction modulo $2^{\beta_k}$ preserves the bounds $0 \le w_k < 2^{\beta_k}$ after carry propagation.

Hence the method produces the correct modular product in the desired mixed-radix form.

Notes

This approach is sometimes called Mersenne multiplication, because modulo $2^q - 1$ arithmetic exploits the Mersenne number property. Using the FFT, the cyclic convolution can be computed in $O(n \log n)$ time, which is asymptotically faster than the $O(n^2)$ schoolbook method when $n$ is large. The technique generalizes to arbitrary bases $b$ if one replaces $2^q - 1$ by $b^q - 1$ and uses the same cyclic convolution logic.

This completes the solution.

$$ \boxed{w_k = \sum_{s+j \equiv k \pmod{n}} u_s v_j, \quad 0 \le k < n} $$