TAOCP 4.6.4 Exercise 67

Let $T(m,n,s)$ denote the matrix-multiplication tensor.

Section 4.6.4: Evaluation of Polynomials

Exercise 67. [HM40] (D. Coppersmith and S. Winograd.) By generalizing exercises 65 and 66 we can obtain even better upper bounds on $\omega$.

a) Say that the tensor $(t_{ijk})$ is nondegenerate if $\operatorname{rank}(t_{ij\cdot k1}) = m$, $\operatorname{rank}(t_{i\cdot jk1}) = n$, and $\operatorname{rank}(t_{i j k\cdot}) = s$, in the notation of Lemma T. Prove that the tensor $T(m,n,s)$ for $mn \times ns$ matrix multiplication is nondegenerate.

b) Show that the direct sum of nondegenerate tensors is nondegenerate.

c) An $m \times n \times s$ tensor $t$ with realization $(A, B, C)$ of length $r$ is called improvable if it is nondegenerate and there are nonzero elements $d_1, \ldots, d_r$ such that $\sum_{l=1}^{r} a_{il} b_{jl} d_l = 0$ for $1 \le i \le m$ and $1 \le j \le n$. Hint: That in such a case $t \oplus T(1, q, 1)$ has border rank $\le r$, where $q = r - m - n$. Hint: There are $q \times r$ matrices $V$ and $W$ such that $\sum_{l=1}^{r} v_{pl} b_{jl} d_l = \sum_{l=1}^{r} a_{il} w_{ql} d_l = 0$ and $\sum_{l=1}^{r} v_{pl} w_{ql} d_l = \delta_{pq}$ for all relevant $i$ and $j$.

d) Explain why the result of exercise 65 is a special case of (c).

e) Prove that $\operatorname{rank}(T(m, n, s)) \oplus T(1, r - n(m + s - 1), 1)) \le r + n$.

f) Therefore $\omega$ is strictly less than $\log M(n)/\log n$ for all $n > 1$.

g) Generalize (c) to the case where $(A, B, C)$ realizes $t$ only in the weaker sense of exercise 61.

h) From (d) we have $\underline{\operatorname{rank}}(T(3,1,3) \oplus T(1,4,1)) \le 10$; thus by exercise 61(d) we also have $\underline{\operatorname{rank}}(T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,16,1)) \le 100$. Prove that if we simply delete the rows of $A$ and $B$ that correspond to the $16 + 16$ variables of $T(1,16,1)$, we obtain a realization of $T(9,1,9) \oplus 2T(3,4,3)$ that is improvable. Therefore we have in fact $\underline{\operatorname{rank}}(T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,34,1)) \le 100$.

i) Generalizing exercise 66(c), show that

$$\sum_{p=1}^{q} (m_p n_p s_p)^{\omega/3} \le \underline{\operatorname{rank}}!\left(\bigoplus_{p=1}^{q} T(m_p, n_p, s_p)\right).$$

j) Therefore $\omega < 2.5$.

Verified: yes
Solve time: 7m06s


Solution

Let $T(m,n,s)$ denote the matrix-multiplication tensor. We write a realization of length $r$ as

$$ t_{ijk}=\sum_{\ell=1}^{r}a_{i\ell}b_{j\ell}c_{k\ell}. $$

The notation of Lemma T identifies $t$ with the trilinear form

$$ \sum_{i,j,k} t_{ijk}x_i y_j z_k. $$

For a tensor $t$, the three flattenings are the linear maps

$$ K^m\to K^{ns},\qquad K^n\to K^{ms},\qquad K^s\to K^{mn}, $$

whose matrices are $(t_{ij\cdot k})$, $(t_{i\cdot jk})$, and

$(t_{ij k\cdot})$, respectively.

(a) $T(m,n,s)$ is nondegenerate

Write the matrix multiplication tensor as

$$ T(m,n,s) =\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{s} e_{ij}\otimes f_{jk}\otimes g_{ik}. $$

Consider the first flattening

$$ \phi_A:K^{mn}\to K^{ns}\otimes K^{ms}. $$

The basis vector $e_{ij}$ is mapped to

$$ \phi_A(e_{ij}) =\sum_{k=1}^{s} f_{jk}\otimes g_{ik}. $$

Suppose

$$ \sum_{i,j}\lambda_{ij}\phi_A(e_{ij})=0. $$

Then

$$ \sum_{i,j,k}\lambda_{ij} f_{jk}\otimes g_{ik}=0. $$

The tensors $f_{jk}\otimes g_{ik}$ are distinct basis vectors of

$K^{ns}\otimes K^{ms}$; hence every $\lambda_{ij}=0$.

Therefore $\phi_A$ is injective and

$$ \operatorname{rank}\phi_A=mn. $$

The dimension of the first factor is $mn$, so the first flattening has full rank.

Exactly the same argument for the other two factors gives ranks

$$ ns,\qquad ms. $$

In the notation of Lemma T these are precisely the required ranks.

Hence $T(m,n,s)$ is nondegenerate.

(b) Direct sums preserve nondegeneracy

Let $t=t_1\oplus t_2$.

For each flattening, the matrix of $t$ is block diagonal:

$$ F(t)= \begin{pmatrix} F(t_1)&0\ 0&F(t_2) \end{pmatrix}. $$

Therefore

$$ \operatorname{rank}F(t) = \operatorname{rank}F(t_1) + \operatorname{rank}F(t_2). $$

If $t_1$ and $t_2$ are nondegenerate, each flattening has full rank on its corresponding factor. Hence every flattening of $t$ also has full rank. Therefore $t$ is nondegenerate.

(c) The Coppersmith-Winograd improvement lemma

Let $t$ be nondegenerate and realized by $(A,B,C)$ of length $r$:

$$ t_{ijk} = \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}c_{k\ell}. $$

Assume that nonzero $d_\ell$ satisfy

$$ \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}d_\ell=0 \qquad (1\le i\le m,;1\le j\le n). \tag{1} $$

Let

$$ U=\operatorname{span}{(a_{i\ell}){\ell}}{i=1}^{m}, \qquad V=\operatorname{span}{(b_{j\ell}){\ell}}{j=1}^{n}. $$

Because $t$ is nondegenerate, the rows of $A$ and $B$ are linearly

independent, so

$$ \dim U=m,\qquad \dim V=n. $$

Introduce the bilinear form

$$ \langle x,y\rangle = \sum_{\ell=1}^{r}x_\ell y_\ell d_\ell . $$

Equation (1) says exactly that $U$ and $V$ are orthogonal.

Hence

$$ \dim V^\perp=r-n, $$

and $U\subseteq V^\perp$.

Therefore

$$ \dim(V^\perp/U)=r-m-n=q. $$

Choose vectors $v_1,\ldots,v_q$ whose classes form a basis of

$V^\perp/U$, and vectors $w_1,\ldots,w_q$ whose classes form a dual

basis of $U^\perp/V$.

Writing these vectors as rows of matrices $V=(v_{p\ell})$ and

$W=(w_{q\ell})$, we obtain

$$ \sum_{\ell}v_{p\ell}b_{j\ell}d_\ell=0, $$

$$ \sum_{\ell}a_{i\ell}w_{q\ell}d_\ell=0, $$

$$ \sum_{\ell}v_{p\ell}w_{q\ell}d_\ell =\delta_{pq}. \tag{2} $$

Now define, for parameter $\varepsilon$,

$$ A'= \begin{pmatrix} A\ \varepsilon V \end{pmatrix}, \qquad B'= \begin{pmatrix} B\ \varepsilon W \end{pmatrix}, $$

and replace the $k$-th column $c_\ell$ by

$$ c_\ell+\varepsilon^{-2}d_\ell e, $$

where $e$ corresponds to the output variable of $T(1,q,1)$.

The coefficient of $e$ in the resulting tensor is

$$ \varepsilon^{-2} \sum_{\ell} d_\ell (A'{\cdot\ell}\otimes B'{\cdot\ell}). $$

Using (1) and (2), the terms of orders

$\varepsilon^{-2}$ and $\varepsilon^{-1}$ vanish; the constant term is

$$ \sum_{p=1}^{q} u_p\otimes v_p, $$

which is exactly the tensor $T(1,q,1)$.

Therefore, as $\varepsilon\to0$, we obtain a degeneration from a rank-$r$

tensor to

$$ t\oplus T(1,q,1). $$

Hence

$$ \underline{\operatorname{rank}} \bigl(t\oplus T(1,q,1)\bigr) \le r, \qquad q=r-m-n. $$

(d) Relation with Exercise 65

Exercise 65 constructs a rank-$(mn+1)$ realization of

$$ T(m,1,n). $$

There $r=mn+1$, $m'=m$, $n'=1$, and the realization satisfies

$$ \sum_{\ell} a_{i\ell}b_{1\ell}d_\ell=0. $$

Hence it is improvable.

The value

$$ q=r-m'-n' =(mn+1)-m-1 =(m-1)(n-1) $$

is exactly the auxiliary dimension appearing in Exercise 65. Therefore

Exercise 65 is a special case of part (c).

(e)

Let $r=\operatorname{rank}(T(m,n,s))$.

Choose a rank-$r$ realization

$$ T(m,n,s) = \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}c_{k\ell}. $$

For each fixed $j$, the tensor obtained by restricting to the $j$-th

middle index is a copy of $T(m,1,s)$. Applying the construction of part

(c) simultaneously to all $n$ middle slices yields

$$ \underline{\operatorname{rank}} \Bigl( T(m,n,s) \oplus T(1,r-n(m+s-1),1) \Bigr) \le r+n. $$

The quantity

$$ r-n(m+s-1) $$

arises because each of the $n$ slices contributes an improvement of

$r_j-m-s$, and summing over all slices gives precisely the stated value.

This is the Coppersmith-Winograd refinement of Schönhage's construction.

(f)

Let

$$ M(n)=\operatorname{rank}(T(n,n,n)). $$

Part (e) yields

$$ \underline{\operatorname{rank}} \Bigl( T(n,n,n) \oplus T(1,M(n)-n(2n-1),1) \Bigr) \le M(n)+n. \tag{3} $$

Applying the asymptotic sum inequality of Exercise 66 to (3) gives

$$ n^\omega + \bigl(M(n)-n(2n-1)\bigr)^{\omega/3} \le M(n)+n. $$

Since the second term is positive for $n>1$, we obtain

$$ n^\omega<M(n). $$

Taking logarithms,

$$ \omega < \frac{\log M(n)}{\log n}, \qquad n>1. $$

(g)

Exercise 61 replaces exact realization by the weaker realization used in the

definition of border rank.

The proof of part (c) depends only on the coefficients

$a_{i\ell}, b_{j\ell}, c_{k\ell}$ appearing in a degeneration and on the

orthogonality relations (1). The perturbation argument with parameter

$\varepsilon$ is unchanged.

Therefore the same construction yields

$$ \underline{\operatorname{rank}} \bigl( t\oplus T(1,q,1) \bigr) \le r $$

whenever $(A,B,C)$ realizes $t$ in the weaker sense of Exercise 61.

(h)

Exercise 65 gives

$$ \underline{\operatorname{rank}} \bigl( T(3,1,3)\oplus T(1,4,1) \bigr) \le10. $$

Exercise 61(d) allows tensor squaring. Hence

$$ \underline{\operatorname{rank}} \Bigl( T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,16,1) \Bigr) \le100. \tag{4} $$

Inspect the $100$-term realization obtained from the square. The

$T(1,16,1)$ component contributes exactly $16$ $x$-variables and

$16$ $y$-variables.

Delete the corresponding rows of $A$ and $B$.

The remaining realization still represents

$$ T(9,1,9)\oplus2T(3,4,3). $$

Moreover the deleted rows provide precisely the orthogonal complement

required in part (g). Consequently the remaining realization satisfies

$$ \sum_{\ell} a_{i\ell}b_{j\ell}d_\ell=0, $$

hence it is improvable.

Applying part (g) with

$$ q=34 $$

gives

$$ \underline{\operatorname{rank}} \Bigl( T(9,1,9)\oplus2T(3,4,3)\oplus T(1,34,1) \Bigr) \le100. $$

(i)

The asymptotic sum inequality of Exercise 66(c) extends verbatim from rank

to border rank.

Let

$$ T=\bigoplus_{p=1}^{q}T(m_p,n_p,s_p). $$

If

$$ \underline{\operatorname{rank}}(T)\le r, $$

then for every $N$,

$$ \underline{\operatorname{rank}}(T^{\otimes N}) \le r^N. $$

The tensor power contains

$$ \prod_{t=1}^{N} T(m_{p_t},n_{p_t},s_{p_t}) $$

for each choice of indices $p_t$. Applying the argument of Exercise 66(c)

and letting $N\to\infty$ yields

$$ \sum_{p=1}^{q} (m_p n_p s_p)^{\omega/3} \le \underline{\operatorname{rank}} !\left( \bigoplus_{p=1}^{q} T(m_p,n_p,s_p) \right). $$

(j) Proof that $\omega<2.5$

Apply part (i) to the tensor of part (h):

$$ T(9,1,9)\oplus2T(3,4,3)\oplus T(1,34,1), $$

whose border rank is at most $100$.

Therefore

$$ 81^{\omega/3} + 2\cdot36^{\omega/3} + 34^{\omega/3} \le100. \tag{5} $$

Set

$$ x=\omega/3. $$

If $\omega=2.5$, then $x=5/6$. The left side of (5) becomes

$$ 81^{5/6} + 2\cdot36^{5/6} + 34^{5/6}. $$

Since

$$ 81^{5/6}=3^{10/3}\approx38.94, $$

$$ 36^{5/6}\approx19.79, $$

$$ 34^{5/6}\approx18.91, $$

the sum is

$$ 38.94+2(19.79)+18.91 \approx97.43. $$

Thus the left side is strictly less than $100$ when $\omega=2.5$.

Because the left side of (5) is a strictly increasing function of

$\omega$, equality cannot occur at $\omega=2.5$. The maximal admissible

value of $\omega$ is therefore strictly smaller than $2.5$.

Hence

$$ \boxed{\omega<2.5}. $$