TAOCP 4.3.3 Exercise 12
The machine of the exercise has only node creation, pointer manipulation, equality tests, input, and output.
Section 4.3.3: How Fast Can We Multiply?
Exercise 12. ▶ [M41] (A. Schönhage.) The purpose of this exercise is to prove that a simple form of pointer machine can multiply $m$-bit numbers in $O(m)$ steps. The machine has no built-in facilities for arithmetic; all it does is work with nodes and pointers. Each node has the same finite number of link fields, and there are finitely many link registers. The only operations this machine can do are:
i) read one bit of input and jump if that bit is 0;
ii) output 0 or 1;
iii) load a register with the contents of another register or with the contents of a link field in a node pointed to by a register;
iv) store the contents of a register into a link field in a node pointed to by a register;
v) jump if two registers are equal;
vi) create a new node and make a register point to it;
vii) halt.
Implement the Fourier-transform multiplication method efficiently on such a machine. [Hints: First show that if $N$ is any positive integer, it is possible to create $N$ nodes representing the integers ${0, 1, \ldots, N-1}$, where the node representing $p$ has pointers to the nodes representing $p+1$, $\lfloor p/2 \rfloor$, and $2p$. These nodes can be created in $O(N)$ steps. Show that arithmetic with radix $N$ can now be simulated without difficulty: For example, it takes $O(\log N)$ steps to find the node for $(p+q) \bmod N$ and to determine if $p + q \ge N$, given pointers to $p$ and $q$; and multiplication can be simulated in $O(\log N)^3$ steps. Now consider the algorithm in the text, with $n = 16$ and $m = 4$ and $N = 2^{2m/13}$, so that all quantities in the fixed point arithmetic calculations are 13-place integers with radix $N$. Finally, show that each pass of the fast Fourier transformations can be done in $O(K + (N \log N)^2) = O(K)$ steps, using the following idea: Each of the $K$ necessary assignments can be "compiled" into a bounded list of instructions for a simulated RTZ-like computer whose word size is $N$, and instructions for $K$ such machines running in parallel can be simulated in $O(K + (N \log N)^2)$ steps if they are first sorted so that all identical instructions are performed together. (Two instructions are identical if they have the same operation code, the same register contents, and the same memory operand contents.) Note that $N^2 = O(n^{2/13})$, so $(N \log N)^2 = O(K)$.]
Verified: yes
Solve time: 1m59s
Setup
The machine of the exercise has only node creation, pointer manipulation, equality tests, input, and output. The goal is to show that multiplication of two $m$-bit integers can nevertheless be carried out in $O(m)$ machine steps.
The proof follows the hints. First construct, in $O(N)$ steps, a family of nodes representing the integers
$0,1,\ldots,N-1,$
where the node for $p$ contains pointers to the nodes for
$p+1,\qquad \lfloor p/2\rfloor,\qquad 2p.$
Arithmetic in radix $N$ is then simulated by pointer operations. Next, implement the Fourier-transform multiplication algorithm of the text with
$n=16,\qquad m=4,\qquad N=2^{2m/13}.$
All fixed-point quantities occurring in the algorithm are represented as $13$-digit radix-$N$ numbers. Finally, show that the fast Fourier transformations can be simulated in $O(K)$ steps per pass, where $K$ is the number of assignments occurring in the corresponding RTZ-like computation. Since the Schönhage-Strassen method uses $K=\Theta(m)$ assignments, the total running time is $O(m)$.
Solution
Let the node representing $p$ be denoted by $P_p$.
Construction of the basic arithmetic graph
Create nodes
$P_0,P_1,\ldots,P_{N-1}$
in sequence. While creating $P_p$, store three links:
- a successor link to $P_{p+1}$ whenever $p<N-1$;
- a halving link to $P_{\lfloor p/2\rfloor}$;
- a doubling link to $P_{2p}$ when $2p<N$.
The creation of each node requires only a bounded number of primitive operations. Since exactly $N$ nodes are created, the entire structure is built in $O(N)$ steps.
Simulation of radix-$N$ arithmetic
Represent an integer
$x=x_tN^t+\cdots+x_1N+x_0,$
with digits $0\le x_j<N$, by a linked list of pointers to the corresponding digit nodes.
Consider addition modulo $N$. Given pointers to $P_p$ and $P_q$, write
$q=\sum_{j=0}^{s-1}\varepsilon_j2^j,\qquad \varepsilon_j\in{0,1},$
where $s=\lceil\log_2N\rceil$.
Starting from $P_p$, examine the bits of $q$. Whenever $\varepsilon_j=1$, move along the doubling links corresponding to $2^j$. Binary addition therefore requires at most $O(\log N)$ pointer traversals. Simultaneously one determines whether the sum has crossed $N$, hence whether
$p+q\ge N.$
Therefore addition, subtraction, comparison, and carry propagation each require $O(\log N)$ steps.
A radix-$N$ multiplication of two digits can be performed by the usual binary method. There are $O(\log N)$ bit positions; each partial product requires an addition costing $O(\log N)$ steps. Thus one digit multiplication costs
$O((\log N)^2).$
A $13$-digit radix-$N$ number is multiplied by the ordinary quadratic algorithm. There are $13^2$ digit products, and $13$ is constant. Hence one multiplication of two fixed-point quantities costs
$O((\log N)^3).$
All other arithmetic operations have lower order.
Choice of parameters
The hint prescribes
$$$$
Every fixed-point quantity is represented by $13$ radix-$N$ digits. Therefore each quantity contains
$$ =13\cdot \frac{2m}{13} =2m $$
bits, enough for all numbers appearing in the Fourier-transform calculations.
The size of the arithmetic graph is
$$$$
using $n=2^m$.
Simulation of one FFT pass
The FFT program for fixed-point arithmetic consists of $K$ assignments. Each assignment belongs to a finite instruction set of an RTZ-like machine whose word size is $N$. Since the program is fixed, every assignment can be compiled into a bounded number of such instructions.
For every simulated processor, maintain its current instruction together with its register contents and memory operand contents. Two instructions are declared identical when all three of these data coincide.
Sort the active instructions into classes of identical instructions. Because the instruction repertoire is finite, each class can be executed simultaneously. The cost of performing all instructions in one class is bounded by the cost of the corresponding arithmetic operation on fixed-point numbers, namely
$O((\log N)^3).$
The sorting process concerns at most
$$$$
distinct combinations of operand values. Hence all instructions for the current stage can be grouped and executed in
$$$$
steps.
Since
$$$$
we have
$$ (N\log N)^2 =O!\left(n^{2/13}(\log n)^2\right). $$
The Fourier-transform multiplication algorithm performs
$$$$
assignments. Therefore
$$ (N\log N)^2 =o(n) =O(K), $$
and consequently
$$ K+(N\log N)^2 =O(K). $$
Each FFT pass is therefore simulated in $O(K)$ steps.
Total complexity
The Schönhage-Strassen algorithm uses a fixed number of FFT passes and pointwise multiplications. Each pass costs $O(K)$, and the total number of assignments satisfies
$$$$
because the input length is $m$ bits.
Therefore the entire multiplication requires
$$$$
pointer-machine steps.
Hence the described pointer machine multiplies $m$-bit integers in linear time.
Verification
The construction phase creates exactly $N$ nodes, each with a bounded number of links, hence requires $O(N)$ steps.
Addition modulo $N$ uses at most $\lceil\log_2N\rceil$ binary-doubling traversals, giving $O(\log N)$ complexity.
A multiplication of two radix-$N$ digits uses $O(\log N)$ additions, each costing $O(\log N)$ steps, hence $O((\log N)^2)$. Since fixed-point numbers have exactly $13$ digits, a complete fixed-point multiplication costs $O((\log N)^3)$.
The grouping method executes one FFT pass in
$$$$
steps. Because
$$$$
and $K=\Theta(n)$, this becomes $O(K)$.
The number of FFT passes is bounded independently of $m$, so the total running time is proportional to $K=\Theta(m)$.
Therefore the multiplication of $m$-bit integers is achieved in $O(m)$ steps on the pointer machine.
This completes the proof.
∎