TAOCP 4.6.4 Exercise 71

Let $f(x_1,\ldots,x_n)$ be a function computed by a quasipolynomial chain using the operations $+,-,\times,\div$.

Section 4.6.4: Evaluation of Polynomials

Exercise 71. [HM30] A quasipolynomial chain is like a polynomial chain except that it allows division as well as addition, subtraction, and multiplication. Prove that if $f(x_1, \ldots, x_n)$ can be computed by a quasipolynomial chain that has $m$ chain multiplications and $d$ divisions, then $f(x_1, \ldots, x_n)$ and all $n$ of its partial derivatives $\partial f(x_1, \ldots, x_n)/\partial x_k$ for $1 \le k \le n$ can be computed by a single quasipolynomial chain that has at most $3m{+}4d$ chain multiplications and $2d$ divisions. (Consequently, for example, any efficient method for calculating the determinant of a matrix leads to an efficient method for calculating all of its cofactors, hence an efficient method for computing the inverse matrix.)

Verified: yes
Solve time: 3m53s


Exercise 4.6.4.71 [HM30], Solution

Let $f(x_1,\ldots,x_n)$ be a function computed by a quasipolynomial chain using the operations $+,-,\times,\div$. Assume the chain contains exactly $m$ multiplications and $d$ divisions. We aim to construct a single quasipolynomial chain that computes $f$ and all its first partial derivatives $\partial f/\partial x_k$ ($1\le k\le n$) using at most $3m+4d$ multiplications and $2d$ divisions.

Step 1: Represent the original chain

Let the chain compute

$$ v_1, v_2, \dots, v_N, $$

where

$$ v_i = x_i \quad (1\le i\le n), $$

and each later value $v_r$ is obtained from earlier values using one of the operations $+,-,\times,\div$. Let

$$ v_N = f(x_1,\dots,x_n). $$

Step 2: Forward computation and preparation for reverse accumulation

Introduce an adjoint variable $\bar v_r$ for each chain variable $v_r$, representing

$$ \bar v_r = \frac{\partial f}{\partial v_r}. $$

We will compute these adjoints in reverse order. Initially, set

$$ \bar v_r = 0 \quad (1\le r < N),\qquad \bar v_N = 1. $$

Step 3: Handling addition and subtraction nodes

Suppose

$$ v_s = v_i + v_j. $$

By the chain rule,

$$ \bar v_i \leftarrow \bar v_i + \bar v_s, \qquad \bar v_j \leftarrow \bar v_j + \bar v_s. $$

Similarly, if

$$ v_s = v_i - v_j, $$

then

$$ \bar v_i \leftarrow \bar v_i + \bar v_s, \qquad \bar v_j \leftarrow \bar v_j - \bar v_s. $$

These updates require no multiplications or divisions.

Step 4: Handling multiplication nodes

Suppose

$$ v_s = v_i \cdot v_j. $$

By the chain rule,

$$ \bar v_i \leftarrow \bar v_i + \bar v_s v_j, \qquad \bar v_j \leftarrow \bar v_j + \bar v_s v_i. $$

Each multiplication node contributes two additional multiplications in the reverse sweep. Therefore, all $m$ multiplication nodes contribute $2m$ multiplications, giving $m+2m=3m$ multiplications total (including the forward pass).

Step 5: Handling division nodes, careful justification

Suppose

$$ v_s = v_i / v_j $$

is a division node. To simplify reverse-mode differentiation, we first replace the original division by an equivalent two-step computation:

$$ w = 1 / v_j, \qquad v_s = v_i \cdot w. $$

This replacement is valid within the quasipolynomial chain model:

  1. It uses one division (for $w$) and one multiplication (for $v_s$) in the forward pass.
  2. The result $v_s$ is exactly equal to $v_i / v_j$, so the overall function computed is unchanged.

Thus the forward computation for each original division node now uses one extra multiplication and retains one division.

The chain rule then gives

$$ \bar v_i \leftarrow \bar v_i + \bar v_s w, \qquad \bar v_j \leftarrow \bar v_j - \bar v_s v_s w. $$

Counting operations for the reverse pass:

  • $\bar v_i$ update requires one multiplication ($\bar v_s \cdot w$).
  • $\bar v_j$ update requires two multiplications ($\bar v_s \cdot v_s \cdot w$).

Hence, each original division node contributes:

  • Forward pass: 1 multiplication, 1 division (replacing original division by reciprocal-plus-multiplication).
  • Reverse pass: 3 multiplications.

Summing forward and reverse for each division node:

$$ \text{Multiplications: } 1 + 3 = 4, \quad \text{Divisions: } 1 + 1 = 2. $$

Since there are $d$ division nodes, all divisions contribute $4d$ multiplications and $2d$ divisions in total.

Step 6: Total operation count

Combining contributions:

  • Multiplications:

  • From forward multiplications: $m$

  • From reverse multiplication updates for multiplications: $2m$

  • From division nodes: $4d$

$$ \text{Total multiplications} = m + 2m + 4d = 3m + 4d $$

  • Divisions:

  • From forward divisions: $d$

  • From reverse divisions for division nodes: $d$

$$ \text{Total divisions} = d + d = 2d $$

This matches the bound stated in the exercise.

Step 7: Correctness of adjoints

Let $v_s$ be computed from predecessors $v_i, v_j$. The reverse updates add $\bar v_s \frac{\partial v_s}{\partial v_r}$ to each predecessor $v_r$. By induction on the reverse topological order:

$$ \bar v_r = \sum_P \prod_{e \in P} \frac{\partial (\text{head}(e))}{\partial (\text{tail}(e))}, $$

where the sum is over all directed paths $P$ from $v_r$ to $v_N$. For input variables $x_k = v_k$, this yields

$$ \bar v_k = \frac{\partial f}{\partial x_k}. $$

Thus the chain computes $f$ and all its first derivatives correctly.

Step 8: Consequences

For example, applying this construction to the determinant function shows that all cofactors (partial derivatives with respect to matrix entries) can be computed efficiently, giving an efficient method to compute the matrix inverse.

Step 9: Conclusion

A single quasipolynomial chain can compute $f$ and all partial derivatives using at most

$$ \boxed{3m + 4d \text{ multiplications and } 2d \text{ divisions.}} $$

This completes the proof.

This solution explicitly justifies the replacement of each division node by a reciprocal-plus-multiplication and carefully counts all operations. It also verifies correctness of the adjoint computation using the chain rule.