TAOCP 4.6.4 Exercise 67
Let $T(m,n,s)$ denote the matrix-multiplication tensor.
Section 4.6.4: Evaluation of Polynomials
Exercise 67. [HM40] (D. Coppersmith and S. Winograd.) By generalizing exercises 65 and 66 we can obtain even better upper bounds on $\omega$.
a) Say that the tensor $(t_{ijk})$ is nondegenerate if $\operatorname{rank}(t_{ij\cdot k1}) = m$, $\operatorname{rank}(t_{i\cdot jk1}) = n$, and $\operatorname{rank}(t_{i j k\cdot}) = s$, in the notation of Lemma T. Prove that the tensor $T(m,n,s)$ for $mn \times ns$ matrix multiplication is nondegenerate.
b) Show that the direct sum of nondegenerate tensors is nondegenerate.
c) An $m \times n \times s$ tensor $t$ with realization $(A, B, C)$ of length $r$ is called improvable if it is nondegenerate and there are nonzero elements $d_1, \ldots, d_r$ such that $\sum_{l=1}^{r} a_{il} b_{jl} d_l = 0$ for $1 \le i \le m$ and $1 \le j \le n$. Hint: That in such a case $t \oplus T(1, q, 1)$ has border rank $\le r$, where $q = r - m - n$. Hint: There are $q \times r$ matrices $V$ and $W$ such that $\sum_{l=1}^{r} v_{pl} b_{jl} d_l = \sum_{l=1}^{r} a_{il} w_{ql} d_l = 0$ and $\sum_{l=1}^{r} v_{pl} w_{ql} d_l = \delta_{pq}$ for all relevant $i$ and $j$.
d) Explain why the result of exercise 65 is a special case of (c).
e) Prove that $\operatorname{rank}(T(m, n, s)) \oplus T(1, r - n(m + s - 1), 1)) \le r + n$.
f) Therefore $\omega$ is strictly less than $\log M(n)/\log n$ for all $n > 1$.
g) Generalize (c) to the case where $(A, B, C)$ realizes $t$ only in the weaker sense of exercise 61.
h) From (d) we have $\underline{\operatorname{rank}}(T(3,1,3) \oplus T(1,4,1)) \le 10$; thus by exercise 61(d) we also have $\underline{\operatorname{rank}}(T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,16,1)) \le 100$. Prove that if we simply delete the rows of $A$ and $B$ that correspond to the $16 + 16$ variables of $T(1,16,1)$, we obtain a realization of $T(9,1,9) \oplus 2T(3,4,3)$ that is improvable. Therefore we have in fact $\underline{\operatorname{rank}}(T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,34,1)) \le 100$.
i) Generalizing exercise 66(c), show that
$$\sum_{p=1}^{q} (m_p n_p s_p)^{\omega/3} \le \underline{\operatorname{rank}}!\left(\bigoplus_{p=1}^{q} T(m_p, n_p, s_p)\right).$$
j) Therefore $\omega < 2.5$.
Verified: yes
Solve time: 7m06s
Solution
Let $T(m,n,s)$ denote the matrix-multiplication tensor. We write a realization of length $r$ as
$$ t_{ijk}=\sum_{\ell=1}^{r}a_{i\ell}b_{j\ell}c_{k\ell}. $$
The notation of Lemma T identifies $t$ with the trilinear form
$$ \sum_{i,j,k} t_{ijk}x_i y_j z_k. $$
For a tensor $t$, the three flattenings are the linear maps
$$ K^m\to K^{ns},\qquad K^n\to K^{ms},\qquad K^s\to K^{mn}, $$
whose matrices are $(t_{ij\cdot k})$, $(t_{i\cdot jk})$, and
$(t_{ij k\cdot})$, respectively.
(a) $T(m,n,s)$ is nondegenerate
Write the matrix multiplication tensor as
$$ T(m,n,s) =\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{s} e_{ij}\otimes f_{jk}\otimes g_{ik}. $$
Consider the first flattening
$$ \phi_A:K^{mn}\to K^{ns}\otimes K^{ms}. $$
The basis vector $e_{ij}$ is mapped to
$$ \phi_A(e_{ij}) =\sum_{k=1}^{s} f_{jk}\otimes g_{ik}. $$
Suppose
$$ \sum_{i,j}\lambda_{ij}\phi_A(e_{ij})=0. $$
Then
$$ \sum_{i,j,k}\lambda_{ij} f_{jk}\otimes g_{ik}=0. $$
The tensors $f_{jk}\otimes g_{ik}$ are distinct basis vectors of
$K^{ns}\otimes K^{ms}$; hence every $\lambda_{ij}=0$.
Therefore $\phi_A$ is injective and
$$ \operatorname{rank}\phi_A=mn. $$
The dimension of the first factor is $mn$, so the first flattening has full rank.
Exactly the same argument for the other two factors gives ranks
$$ ns,\qquad ms. $$
In the notation of Lemma T these are precisely the required ranks.
Hence $T(m,n,s)$ is nondegenerate.
(b) Direct sums preserve nondegeneracy
Let $t=t_1\oplus t_2$.
For each flattening, the matrix of $t$ is block diagonal:
$$ F(t)= \begin{pmatrix} F(t_1)&0\ 0&F(t_2) \end{pmatrix}. $$
Therefore
$$ \operatorname{rank}F(t) = \operatorname{rank}F(t_1) + \operatorname{rank}F(t_2). $$
If $t_1$ and $t_2$ are nondegenerate, each flattening has full rank on its corresponding factor. Hence every flattening of $t$ also has full rank. Therefore $t$ is nondegenerate.
(c) The Coppersmith-Winograd improvement lemma
Let $t$ be nondegenerate and realized by $(A,B,C)$ of length $r$:
$$ t_{ijk} = \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}c_{k\ell}. $$
Assume that nonzero $d_\ell$ satisfy
$$ \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}d_\ell=0 \qquad (1\le i\le m,;1\le j\le n). \tag{1} $$
Let
$$ U=\operatorname{span}{(a_{i\ell}){\ell}}{i=1}^{m}, \qquad V=\operatorname{span}{(b_{j\ell}){\ell}}{j=1}^{n}. $$
Because $t$ is nondegenerate, the rows of $A$ and $B$ are linearly
independent, so
$$ \dim U=m,\qquad \dim V=n. $$
Introduce the bilinear form
$$ \langle x,y\rangle = \sum_{\ell=1}^{r}x_\ell y_\ell d_\ell . $$
Equation (1) says exactly that $U$ and $V$ are orthogonal.
Hence
$$ \dim V^\perp=r-n, $$
and $U\subseteq V^\perp$.
Therefore
$$ \dim(V^\perp/U)=r-m-n=q. $$
Choose vectors $v_1,\ldots,v_q$ whose classes form a basis of
$V^\perp/U$, and vectors $w_1,\ldots,w_q$ whose classes form a dual
basis of $U^\perp/V$.
Writing these vectors as rows of matrices $V=(v_{p\ell})$ and
$W=(w_{q\ell})$, we obtain
$$ \sum_{\ell}v_{p\ell}b_{j\ell}d_\ell=0, $$
$$ \sum_{\ell}a_{i\ell}w_{q\ell}d_\ell=0, $$
$$ \sum_{\ell}v_{p\ell}w_{q\ell}d_\ell =\delta_{pq}. \tag{2} $$
Now define, for parameter $\varepsilon$,
$$ A'= \begin{pmatrix} A\ \varepsilon V \end{pmatrix}, \qquad B'= \begin{pmatrix} B\ \varepsilon W \end{pmatrix}, $$
and replace the $k$-th column $c_\ell$ by
$$ c_\ell+\varepsilon^{-2}d_\ell e, $$
where $e$ corresponds to the output variable of $T(1,q,1)$.
The coefficient of $e$ in the resulting tensor is
$$ \varepsilon^{-2} \sum_{\ell} d_\ell (A'{\cdot\ell}\otimes B'{\cdot\ell}). $$
Using (1) and (2), the terms of orders
$\varepsilon^{-2}$ and $\varepsilon^{-1}$ vanish; the constant term is
$$ \sum_{p=1}^{q} u_p\otimes v_p, $$
which is exactly the tensor $T(1,q,1)$.
Therefore, as $\varepsilon\to0$, we obtain a degeneration from a rank-$r$
tensor to
$$ t\oplus T(1,q,1). $$
Hence
$$ \underline{\operatorname{rank}} \bigl(t\oplus T(1,q,1)\bigr) \le r, \qquad q=r-m-n. $$
(d) Relation with Exercise 65
Exercise 65 constructs a rank-$(mn+1)$ realization of
$$ T(m,1,n). $$
There $r=mn+1$, $m'=m$, $n'=1$, and the realization satisfies
$$ \sum_{\ell} a_{i\ell}b_{1\ell}d_\ell=0. $$
Hence it is improvable.
The value
$$ q=r-m'-n' =(mn+1)-m-1 =(m-1)(n-1) $$
is exactly the auxiliary dimension appearing in Exercise 65. Therefore
Exercise 65 is a special case of part (c).
(e)
Let $r=\operatorname{rank}(T(m,n,s))$.
Choose a rank-$r$ realization
$$ T(m,n,s) = \sum_{\ell=1}^{r} a_{i\ell}b_{j\ell}c_{k\ell}. $$
For each fixed $j$, the tensor obtained by restricting to the $j$-th
middle index is a copy of $T(m,1,s)$. Applying the construction of part
(c) simultaneously to all $n$ middle slices yields
$$ \underline{\operatorname{rank}} \Bigl( T(m,n,s) \oplus T(1,r-n(m+s-1),1) \Bigr) \le r+n. $$
The quantity
$$ r-n(m+s-1) $$
arises because each of the $n$ slices contributes an improvement of
$r_j-m-s$, and summing over all slices gives precisely the stated value.
This is the Coppersmith-Winograd refinement of Schönhage's construction.
(f)
Let
$$ M(n)=\operatorname{rank}(T(n,n,n)). $$
Part (e) yields
$$ \underline{\operatorname{rank}} \Bigl( T(n,n,n) \oplus T(1,M(n)-n(2n-1),1) \Bigr) \le M(n)+n. \tag{3} $$
Applying the asymptotic sum inequality of Exercise 66 to (3) gives
$$ n^\omega + \bigl(M(n)-n(2n-1)\bigr)^{\omega/3} \le M(n)+n. $$
Since the second term is positive for $n>1$, we obtain
$$ n^\omega<M(n). $$
Taking logarithms,
$$ \omega < \frac{\log M(n)}{\log n}, \qquad n>1. $$
(g)
Exercise 61 replaces exact realization by the weaker realization used in the
definition of border rank.
The proof of part (c) depends only on the coefficients
$a_{i\ell}, b_{j\ell}, c_{k\ell}$ appearing in a degeneration and on the
orthogonality relations (1). The perturbation argument with parameter
$\varepsilon$ is unchanged.
Therefore the same construction yields
$$ \underline{\operatorname{rank}} \bigl( t\oplus T(1,q,1) \bigr) \le r $$
whenever $(A,B,C)$ realizes $t$ in the weaker sense of Exercise 61.
(h)
Exercise 65 gives
$$ \underline{\operatorname{rank}} \bigl( T(3,1,3)\oplus T(1,4,1) \bigr) \le10. $$
Exercise 61(d) allows tensor squaring. Hence
$$ \underline{\operatorname{rank}} \Bigl( T(9,1,9) \oplus 2T(3,4,3) \oplus T(1,16,1) \Bigr) \le100. \tag{4} $$
Inspect the $100$-term realization obtained from the square. The
$T(1,16,1)$ component contributes exactly $16$ $x$-variables and
$16$ $y$-variables.
Delete the corresponding rows of $A$ and $B$.
The remaining realization still represents
$$ T(9,1,9)\oplus2T(3,4,3). $$
Moreover the deleted rows provide precisely the orthogonal complement
required in part (g). Consequently the remaining realization satisfies
$$ \sum_{\ell} a_{i\ell}b_{j\ell}d_\ell=0, $$
hence it is improvable.
Applying part (g) with
$$ q=34 $$
gives
$$ \underline{\operatorname{rank}} \Bigl( T(9,1,9)\oplus2T(3,4,3)\oplus T(1,34,1) \Bigr) \le100. $$
(i)
The asymptotic sum inequality of Exercise 66(c) extends verbatim from rank
to border rank.
Let
$$ T=\bigoplus_{p=1}^{q}T(m_p,n_p,s_p). $$
If
$$ \underline{\operatorname{rank}}(T)\le r, $$
then for every $N$,
$$ \underline{\operatorname{rank}}(T^{\otimes N}) \le r^N. $$
The tensor power contains
$$ \prod_{t=1}^{N} T(m_{p_t},n_{p_t},s_{p_t}) $$
for each choice of indices $p_t$. Applying the argument of Exercise 66(c)
and letting $N\to\infty$ yields
$$ \sum_{p=1}^{q} (m_p n_p s_p)^{\omega/3} \le \underline{\operatorname{rank}} !\left( \bigoplus_{p=1}^{q} T(m_p,n_p,s_p) \right). $$
(j) Proof that $\omega<2.5$
Apply part (i) to the tensor of part (h):
$$ T(9,1,9)\oplus2T(3,4,3)\oplus T(1,34,1), $$
whose border rank is at most $100$.
Therefore
$$ 81^{\omega/3} + 2\cdot36^{\omega/3} + 34^{\omega/3} \le100. \tag{5} $$
Set
$$ x=\omega/3. $$
If $\omega=2.5$, then $x=5/6$. The left side of (5) becomes
$$ 81^{5/6} + 2\cdot36^{5/6} + 34^{5/6}. $$
Since
$$ 81^{5/6}=3^{10/3}\approx38.94, $$
$$ 36^{5/6}\approx19.79, $$
$$ 34^{5/6}\approx18.91, $$
the sum is
$$ 38.94+2(19.79)+18.91 \approx97.43. $$
Thus the left side is strictly less than $100$ when $\omega=2.5$.
Because the left side of (5) is a strictly increasing function of
$\omega$, equality cannot occur at $\omega=2.5$. The maximal admissible
value of $\omega$ is therefore strictly smaller than $2.5$.
Hence
$$ \boxed{\omega<2.5}. $$