UniDistance | BMathE_M06

12.4 Calculations

Let \(f : V \to V\) be an endomorphism of the finite dimensional \(\mathbb{K}\)-vector space \(V.\) A generalised eigenvector of rank \(m \in \mathbb{N}\) with eigenvalue \(\lambda \in \mathbb{K}\) of \(f\) is an element of \(U_m=\operatorname{Ker}g^m_{\lambda},\) where \(g_{\lambda}=f-\lambda\mathrm{Id}_V.\) Therefore, we have at most \(\dim \operatorname{Ker}g^m_{\lambda}\) linearly independent generalised eigenvectors of rank \(m.\) However the subspace \(\operatorname{Ker}g^m_{\lambda}\) also contains generalised eigenvectors of rank \(j\) for \(1\leqslant j\leqslant m-1\) and those are elements of \(U_{m-1}=\operatorname{Ker}g^{m-1}_{\lambda}\subset U_m.\) The number \(\rho_m(\lambda)\) of generalised eigenvectors of rank \(m\) with eigenvalue \(\lambda\) of \(f\) in a Jordan basis of \(f\) is thus given by the dimension of the quotient vector space \(U_m/U_{m-1}.\) For \(\lambda \in \mathbb{K}\) and \(m \in \mathbb{N}\) we define \[\rho_m(\lambda)=\dim(U_{m}/U_{m-1})=\dim \operatorname{Ker}(g_{\lambda}^m)-\dim\operatorname{Ker}(g_{\lambda}^{m-1}),\] where the second equality uses Proposition 7.10. Using the rank-nullity Theorem 3.76, we obtain \[\rho_m(\lambda)=\dim V-\operatorname{rank}g_{\lambda}^{m}-(\dim V-\operatorname{rank}g_{\lambda}^{m-1})=\operatorname{rank}g^{m-1}_{\lambda}-\operatorname{rank}g^m_{\lambda}.\] There are only finitely many integers \(m\) for which \(\rho_m(\lambda)\geqslant 0\) is non-zero. This follows from the following observation:

Lemma 12.21

Let \(g : V \to V\) be an endomorphism of the \(\mathbb{K}\)-vector space \(V\) and suppose there exists \(m \in \mathbb{N}\) such that \[\operatorname{Ker}(g^{m+1})=\operatorname{Ker}(g^m).\] Then we have \[\operatorname{Ker}(g^{m})=\operatorname{Ker}(g^{m+1})=\operatorname{Ker}(g^{m+2})=\operatorname{Ker}(g^{m+3})=\operatorname{Ker}(g^{m+4})=\cdots\]

Proof. Let \(k \in \mathbb{N}\) be arbitrary. We want to show that \(\operatorname{Ker}(g^{m+k})=\operatorname{Ker}(g^{m+k+1}).\) Since \(\operatorname{Ker}(g^{m+k})\subset \operatorname{Ker}(g^{m+k+1})\) we only need to show that \(\operatorname{Ker}(g^{m+k+1})\subset \operatorname{Ker}(g^{m+k}).\) Let \(v \in \operatorname{Ker}(g^{m+k+1}).\) Then \[g^{m+1}(g^k(v))=g^{m+k+1}(v)=0_V\] and hence \(g^k(v) \in \operatorname{Ker}(g^{m+1})=\operatorname{Ker}(g^{m}).\) This implies that \(g^m(g^k(v))=g^{m+k}(v)=0_V,\) therefore \(v \in \operatorname{Ker}(g^{m+k})\) which shows that \(\operatorname{Ker}(g^{m+k+1})\subset \operatorname{Ker}(g^{m+k}).\)

Example 12.22

Let \[\mathbf{A}=\begin{pmatrix} 2 & 1 & -1 & 0 & 0 & 0 \\ 0 & 2 & 1 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 1 & 0 \\ 0 & 0 & 0 & 0 & 2 & 1 \\ 0 & 0 & 0 & 0 & 0 & 4 \end{pmatrix}.\] Since \(\mathbf{A}\) is an upper triangular matrix we see immediately that its eigenvalues are \(\lambda_1=2\) and \(\lambda_2=4.\) We compute \[\mathbf{A}-2\cdot \mathbf{1}_{6}=\begin{pmatrix} 0 & 1 & -1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 2 \end{pmatrix}\] and \[(\mathbf{A}-2\cdot \mathbf{1}_{6})^2=\begin{pmatrix} 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 1\\0 & 0 & 0 & 0 & 0 & 2\\0 & 0 & 0 & 0 & 0 & 4\end{pmatrix}, \quad (\mathbf{A}-2\cdot \mathbf{1}_{6})^3=\begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0& 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 2\\0 & 0 & 0 & 0 & 0 & 4\\0 & 0 & 0 & 0 & 0 & 8\end{pmatrix}\] From the expression for \((\mathbf{A}-2\cdot \mathbf{1}_{6})^3\) we conclude that \(\operatorname{rank}((\mathbf{A}-2\cdot \mathbf{1}_{6})^k)=1\) for \(k\geqslant 3.\) We thus obtain \[\begin{aligned} \rho_1(2)&=\operatorname{rank}\mathbf{1}_{6}-\operatorname{rank}(\mathbf{A}-2\cdot \mathbf{1}_{6})=6-4=2,\\ \rho_2(2)&=\operatorname{rank}(\mathbf{A}-2\cdot \mathbf{1}_{6})-\operatorname{rank}((\mathbf{A}-2\cdot \mathbf{1}_{6})^2)=4-2=2,\\ \rho_3(2)&=\operatorname{rank}((\mathbf{A}-2\cdot \mathbf{1}_{6})^2)-\operatorname{rank}((\mathbf{A}-2\cdot \mathbf{1}_{6})^3)=2-1=1,\\ \rho_k(2)&=0, \;k\geqslant 4. \end{aligned}\] A Jordan basis of \(f_\mathbf{A}\) thus contains \(1=\rho_3(2)\) generalised eigenvector of rank \(3\) with eigenvalue \(2.\) Since \[\operatorname{Ker}((\mathbf{A}-2\cdot \mathbf{1}_{6})^3)=\operatorname{span}\{\vec{e}_1,\vec{e}_2,\vec{e}_3,\vec{e}_4,\vec{e}_5\}\] and \((\mathbf{A}-2\cdot \mathbf{1}_{6})^2\vec{e}_i=0_{\mathbb{K}^6}\) for \(i\neq 3,6\) we conclude that \(\vec{e}_3\) is a generalised eigenvector of rank \(3\) with eigenvalue \(2.\) The first three vectors of a Jordan basis of \(f_\mathbf{A}\) are thus given by \[(\mathbf{A}-2\cdot\mathbf{1}_{6})^2\vec{e}_3=\vec{e}_1, \quad (\mathbf{A}-2\cdot\mathbf{1}_{6})\vec{e}_3=-\vec{e}_1+\vec{e}_2,\quad \vec{e}_3\]

By construction, \(-\vec{e}_1+\vec{e}_2\) is a generalised eigenvector of rank \(2\) and since \(\rho_2(2)=2,\) there must be one more generalised eigenvector of rank \(2\) in a Jordan basis of \(f_\mathbf{A}.\) We compute \[\operatorname{Ker}((\mathbf{A}-2\cdot \mathbf{1}_{6})^2)=\operatorname{span}\{\vec{e}_1,\vec{e}_2,\vec{e}_4,\vec{e}_5\}\] and that \((\mathbf{A}-2\cdot\mathbf{1}_{6})\vec{e}_2\neq 0_{\mathbb{K}^6}\) and \((\mathbf{A}-2\cdot\mathbf{1}_{6})\vec{e}_5\neq 0_{\mathbb{K}^6}.\) While \(\vec{e}_2\) is a generalised eigenvector of rank \(2\) with eigenvalue \(2,\) it is not linearly independent from our first three Jordan basis vectors \(\{\vec{e}_1,-\vec{e}_1+\vec{e}_2,\vec{e}_3\}.\) The vector \(\vec{e}_5\) is however linearly independent from the previous Jordan basis vectors and we obtain \((\mathbf{A}-2\cdot \mathbf{1}_{6})\vec{e}_5=\vec{e}_4.\) The linearly independent vectors \(\vec{e}_1,-\vec{e}_1+\vec{e}_2,\vec{e}_3,\vec{e}_4,\vec{e}_5\) thus span \(\mathcal{E}_\mathbf{A}(2).\)

The eigenvalue \(\lambda_2=4\) has algebraic multiplicity \(1\) and hence also geometric multiplicity \(1.\) We compute \[\mathcal{E}_\mathbf{A}(4)=\operatorname{Eig}_\mathbf{A}(4)=\operatorname{span}\{\vec{e}_4+2\vec{e}_5+4\vec{e}_6\}.\] Summarising, an ordered Jordan basis of \(f_\mathbf{A}\) is given by \[\mathbf{b}=(\vec{e}_1,-\vec{e}_1+\vec{e}_2,\vec{e}_3,\vec{e}_4,\vec{e}_5,\vec{e}_4+2\vec{e}_5+4\vec{e}_6).\] and by construction, we have \[\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})=\operatorname{diag}(\mathbf{J}_3(2),\mathbf{J}_2(2),\mathbf{J}_1(4)),\] as can also be verified by direct computation.

Example 12.23

Let \[\mathbf{A}=\begin{pmatrix} 1 & 1 & 0 & -1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{pmatrix}.\] Here we have a single eigenvalue \(1\) of algebraic multiplicity \(4.\) We obtain \[\mathbf{A}-1\cdot\mathbf{1}_{4}=\begin{pmatrix} 0 & 1 & 0 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0\end{pmatrix}\quad \text{and}\quad (\mathbf{A}-1\cdot\mathbf{1}_{4})^2=0_4.\] Correspondingly, we compute \(\rho_2(1)=2\) and \(\rho_1(1)=2.\) A Jordan basis thus contains \(2=\rho_2(1)\) generalised eigenvectors of rank \(2\) with eigenvalue \(1\) and those can be chosen to be \(\vec{e}_2\) and \(\vec{e}_4.\) We obtain \((\mathbf{A}-1\cdot\mathbf{1}_{4})\vec{e}_2=\vec{e}_1\) and \((\mathbf{A}-1\cdot\mathbf{1}_{4})\vec{e}_4=-\vec{e}_1+\vec{e}_3.\) Summarising, an ordered Jordan basis of \(f_\mathbf{A}\) is given by \[\mathbf{b}=(\vec{e}_1,\vec{e}_2,-\vec{e}_1+\vec{e}_3,\vec{e}_4).\] and by construction, we have \[\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})=\operatorname{diag}(\mathbf{J}_2(1),\mathbf{J}_2(1))\] as can also be verified by direct computation.

Example 12.24

Let \[\mathbf{A}=\begin{pmatrix} 4 & 0 & 1 & 0 \\ 2 & 2 & 3 & 0 \\ -1 & 0 & 2 & 0 \\ 4 & 0 & 1 & 2\end{pmatrix}\] Here the characteristic polynomial is \(\operatorname{char}_\mathbf{A}(x)=(x-3)^2(x-2)^2\) so that we have eigenvalues \(\lambda_1=3\) and \(\lambda_2=2,\) both with algebraic multiplicity \(2.\) As before, we compute that \(\rho_2(3)=1\) and \(\rho_1(3)=1\) so that a Jordan basis for \(f_\mathbf{A}\) contains \(1=\rho_2(3)\) generalised eigenvector of rank \(2\) and \(1=\rho_1(3)\) generalised eigenvector of rank \(1,\) both with eigenvalue \(3.\) The generalised eigenvector of rank \(2\) can be chosen to be \(\vec{e}_1+3\vec{e}_2+\vec{e}_4\) and hence \((\mathbf{A}-3\cdot\mathbf{1}_{4})(\vec{e}_1+3\vec{e}_2+\vec{e}_4)=\vec{e}_1-\vec{e}_2-\vec{e}_3+3\vec{e}_4\) is the corresponding generalised eigenvector of rank \(1.\)

Likewise, we obtain \(\rho_1(2)=2\) so that a Jordan basis contains two eigenvectors (of rank \(1\)) with eigenvalue \(2.\) These can be chosen to be \(\vec{e}_2\) and \(\vec{e}_4.\)

Summarising, an ordered Jordan basis of \(f_\mathbf{A}\) is given by \[\mathbf{b}=(\vec{e}_1-\vec{e}_2-\vec{e}_3+3\vec{e}_4,\vec{e}_1+3\vec{e}_2+\vec{e}_4,\vec{e}_2,\vec{e}_4).\] and by construction, we have \[\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})=\operatorname{diag}(\mathbf{J}_2(3),\mathbf{J}_1(2),\mathbf{J}_1(2))\] as can also be verified by direct computation.

12.5 Applications

12.5.1 The Cayley–Hamilton theorem

Recall that the \(\mathbb{K}\)-vector space \(M_{n,n}(\mathbb{K})\) of \(n\times n\)-matrices has dimension \(n^2.\) Therefore, for a matrix \(\mathbf{B}\in M_{n,n}(\mathbb{K})\) the sequence of vectors in \(M_{n,n}(\mathbb{K})\) given by the powers of \(\mathbf{B}\) \[\mathbf{1}_{n},\mathbf{B},\mathbf{B}^2,\mathbf{B}^3,\mathbf{B}^4,\ldots\] must become linearly dependent. That is, there must exist coefficients \(a_{i} \in \mathbb{K}\) for \(0\leqslant i\leqslant n^2,\) not all zero such that \[a_{n^2}\mathbf{B}^{n^2}+a_{n^2-1}\mathbf{B}^{n^2-1}+\cdots+a_2\mathbf{B}^2+a_1\mathbf{B}+a_0\mathbf{1}_{n}=\mathbf{0}_n.\]

Remark 12.25 • Notation

Let \(a_n,a_{n-1},\ldots,a_1,a_0 \in \mathbb{K}.\) For a polynomial \(p : \mathbb{K}\to \mathbb{K}\) defined by the rule \(p(x)=a_nx^n+a_{n-1}x^{n-1}+\cdots+a_1x+a_0\) for all \(x \in \mathbb{K}\) and a matrix \(\mathbf{B}\in M_{n,n}(\mathbb{K}),\) we define \[p(\mathbf{B})=a_n\mathbf{B}^n+a_{n-1}\mathbf{B}^{n-1}+\cdots+a_1\mathbf{B}+a_0\mathbf{1}_{n}.\] We say a matrix \(\mathbf{B}\in M_{n,n}(\mathbb{K})\) is a zero of the polynomial \(p\) if \(p(\mathbf{B})=\mathbf{0}_n.\)

Above we have seen that every matrix \(\mathbf{B}\in M_{n,n}(\mathbb{K})\) is a zero of a polynomial of degree at most \(n^2.\) One might wonder whether there exists a positive integer \(d\) that is strictly smaller than \(n^2\) so that every \(n\times n\)-matrix is a zero of a polynomial of degree \(d.\)

It turns out that such an integer \(d\) must be at least as big as \(n.\) For scalars \(\lambda_1,\lambda_2,\ldots,\lambda_n\) consider the diagonal matrix \[\mathbf{D}=\begin{pmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{pmatrix} \qquad \text{with} \qquad \mathbf{D}^k=\begin{pmatrix} \lambda_1^k & & & \\ & \lambda_2^k & & \\ & & \ddots & \\ & & & \lambda_n^k \end{pmatrix}\] for all \(k \in \mathbb{N}.\) Say we can find a polynomial \(p\) of degree \(n-1\) such that \(p(\mathbf{D})=\mathbf{0}_n.\) Write \(p(x)=a_{n-1}x^{n-1}+a_{n-2}x^{n-2}+\cdots+a_1x+a_0\) for coefficients \(a_{n-1},\ldots,a_0.\) All off-diagonal entries of \(p(\mathbf{D})\) are zero and for the \(i\)-th diagonal entry of \(p(\mathbf{D})\) we obtain \([p(\mathbf{D})]_{ii}=a_{n-1}\lambda_i^{n-1}+a_{n-2}\lambda_i^{n-2}+\cdots+a_{1}\lambda_i+a_0.\) The equation \(p(\mathbf{D})=\mathbf{0}_n\) is equivalent to the linear system of equations \([p(\mathbf{D})]_{11}=[p(\mathbf{D})]_{22}=\cdots=[p(\mathbf{D})]_{nn}=0\) for the coefficients \(a_{0},a_1,\ldots,a_{n-1}\) and it can be written as \[\begin{pmatrix} 1 & \lambda_1 & (\lambda_1)^2 & \cdots & (\lambda_1)^{n-1} \\ 1 & \lambda_2 & (\lambda_2)^2 & \cdots & (\lambda_2)^{n-1} \\ 1 & \lambda_3 & (\lambda_3)^2 & \cdots & (\lambda_3)^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_n & (\lambda_n)^2 & \cdots & (\lambda_n)^{n-1}\end{pmatrix}\begin{pmatrix} a_0 \\ a_1 \\ a_2 \\ \vdots \\ a_{n-1}\end{pmatrix}=0_{\mathbb{K}^n}\] The matrix on the left hand side is the Vandermonde matrix \(\mathbf{V}_{\vec{\lambda}}\) for the vector \(\vec{\lambda}=(\lambda_i)_{1\leqslant i\leqslant n}.\) Unless \(\det(\mathbf{V}_{\vec{\lambda}})=0,\) we cannot find a non-zero solution of coefficients \(a_{n-1},\ldots,a_0\) such that \(p(\mathbf{D})=\mathbf{0}_n.\) By, Example 5.42, we have \[\det(\mathbf{V}_{\lambda})=\prod_{1\leqslant i<j\leqslant n} (\lambda_j-\lambda_i)\] and hence if all eigenvalues of \(\mathbf{D}\) are distinct, then \(\det(\mathbf{V}_{\lambda})\neq 0.\) It follows that the smallest positive integer \(d,\) so that every \(n\times n\)-matrix is a zero of a polynomial of degree \(d,\) must be at least \(n.\)

For every \(n\times n\)-matrix \(\mathbf{A}\) we can indeed always find a polynomial \(p\) of degree \(n,\) so that \(p(\mathbf{A})=\mathbf{0}_n\):

Theorem 12.26 • Cayley–Hamilton theorem

Every matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is a zero of its characteristic polynomial \(\operatorname{char}_\mathbf{A}: \mathbb{K}\to \mathbb{K}\) \[\operatorname{char}_\mathbf{A}(\mathbf{A})=\mathbf{0}_n.\]

Example 12.27

Recall from Remark 6.42 that for \(\mathbf{A}\in M_{2,2}(\mathbb{K})\) we have \(\operatorname{char}_\mathbf{A}(\lambda)=\lambda^2-\operatorname{Tr}(\mathbf{A})\lambda+\det(\mathbf{A}).\) Thus, Theorem 12.26 implies that for all \(\mathbf{A}\in M_{2,2}(\mathbb{K})\) we have \[\mathbf{A}^2-\operatorname{Tr}(\mathbf{A})\mathbf{A}+\det(\mathbf{A})\mathbf{1}_{2}=\mathbf{0}_2.\] For an invertible \(2\times 2\)-matrix \(\mathbf{A}\) we may write \(\mathbf{1}_{2}=\mathbf{A}\mathbf{A}^{-1}\) so that \[\mathbf{A}(\mathbf{A}-\operatorname{Tr}(\mathbf{A})\mathbf{1}_{2}+\det(\mathbf{A})\mathbf{A}^{-1})=\mathbf{0}_2\] and hence \[\mathbf{A}^{-1}=\frac{1}{\det \mathbf{A}}\left(\operatorname{Tr}(\mathbf{A})\mathbf{1}_{2}-\mathbf{A}\right)\] which can of course also be verified by direct computation.

Remark 12.28

It is tempting to argue that \[\operatorname{char}_{\mathbf{A}}(\mathbf{A})=\det(\mathbf{A}\mathbf{1}_{n}-\mathbf{A})=0.\] Notice however that \(\operatorname{char}_{\mathbf{A}}(\mathbf{A})\) is an \(n\times n\)-matrix, whereas \(\det(\mathbf{A}\mathbf{1}_{n}-\mathbf{A})\) is a scalar, so the previous equation makes no sense if \(n>1.\)

That this incorrect calculation gives the correct answer is an accident. To see this observe that for any function \(h : M_{n,n}(\mathbb{K}) \to \mathbb{K}\) and to every \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) we obtain a function \[q : \mathbb{K}\to \mathbb{K}, \qquad x \mapsto h(x\mathbf{1}_{n}-\mathbf{A}).\] If \(h\) is polynomial in the entries of the input matrix, the function \(q\) is a polynomial \(p_{\mathbf{A}} : \mathbb{K}\to \mathbb{K}\) depending on \(\mathbf{A},\) so that \(q(x)=p_{\mathbf{A}}(x)\) for all \(x \in \mathbb{K}.\) Arguing (wrongly!) as before we would expect that \(p_{\mathbf{A}}(\mathbf{A})=\mathbf{0}_n.\) This is however not true in general. Consider for instance \[h : M_{2,2}(\mathbb{K}) \to \mathbb{K}, \qquad \begin{pmatrix} a & b \\ c & d \end{pmatrix} \mapsto bd\] so that for \[\mathbf{A}=\begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}\] we have \[q(x)=p_{\mathbf{A}}(x)=-A_{12}(x-A_{22})\] and hence \[p_{\mathbf{A}}(\mathbf{A})=-A_{12}\mathbf{A}+A_{12}A_{22}\mathbf{1}_{2}.\] For \[\mathbf{A}=\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}\] we thus obtain \[p_{\mathbf{A}}(\mathbf{A})=-1 \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}+1\cdot 0\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}=\begin{pmatrix} 0 & -1 \\ 0 & 0 \end{pmatrix}\neq \mathbf{0}_2.\]

Proof of Theorem 12.26
Theorem 12.26 • Cayley–Hamilton theorem ➔
Every matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is a zero of its characteristic polynomial \(\operatorname{char}_\mathbf{A}: \mathbb{K}\to \mathbb{K}\) \[\operatorname{char}_\mathbf{A}(\mathbf{A})=\mathbf{0}_n.\]

. Let \(\mathbf{B}\in M_{n,n}(\mathbb{K}).\) Recall that \(\operatorname{char}_\mathbf{B}(x)=\det(x\mathbf{1}_{n}-\mathbf{B})\) for all \(x\in \mathbb{K}.\) Using the product rule Proposition 5.21, for an invertible \(n\times n\)-matrix \(\mathbf{C}\) we thus obtain \[\begin{aligned} \det(\mathbf{C}(x\mathbf{1}_{n}-\mathbf{B})\mathbf{C}^{-1})&=\det(\mathbf{C})\det((x\mathbf{1}_{n}-\mathbf{B})\mathbf{C}^{-1})=\det(\mathbf{C})\det(x\mathbf{1}_{n}-\mathbf{B})\det(\mathbf{C}^{-1})\\ &=\det(x\mathbf{1}_{n}-\mathbf{B})=\det(x\mathbf{1}_{n}-\mathbf{C}\mathbf{B}\mathbf{C}^{-1}) \end{aligned}\] and hence conjugate matrices have the same characteristic polynomial, that is \[\tag{12.7} \operatorname{char}_\mathbf{B}(x)=\operatorname{char}_{\mathbf{C}\mathbf{B}\mathbf{C}^{-1}}(x)\] for all \(x\in \mathbb{K}.\)

We first consider the case \(\mathbb{K}=\mathbb{C}.\) Let \(\mathbf{A}\in M_{n,n}(\mathbb{C})\) be an \(n\times n\)-matrix with complex entries. By Theorem 12.12, there exists an ordered basis \(\mathbf{b}\) of \(\mathbb{C}^n,\) an integer \(k\geqslant 1,\) integers \(n_1,\ldots,n_k\) and complex numbers \(\lambda_1,\ldots,\lambda_k\) such that \(\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})=\mathbf{B},\) where we write \[\mathbf{B}=\operatorname{diag}(\mathbf{J}_{n_1}(\lambda_1),\mathbf{J}_{n_2}(\lambda_2),\ldots,\mathbf{J}_{n_k}(\lambda_k)).\] Let \(\mathbf{e}\) denote the standard ordered basis of \(\mathbb{C}^n\) so that \(\mathbf{M}(f_\mathbf{A},\mathbf{e},\mathbf{e})=\mathbf{A}\) and let \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{e})\) denote the change of basis matrix. By Theorem 3.107 we have \(\mathbf{A}=\mathbf{C}\mathbf{B}\mathbf{C}^{-1}.\)

We want to show that \[\mathbf{0}_n=\operatorname{char}_\mathbf{A}(\mathbf{A})=\operatorname{char}_{\mathbf{C}\mathbf{B}\mathbf{C}^{-1}}(\mathbf{C}\mathbf{B}\mathbf{C}^{-1})=\operatorname{char}_\mathbf{B}(\mathbf{C}\mathbf{B}\mathbf{C}^{-1}),\] where the third equality uses (12.7). By induction one shows that \((\mathbf{C}\mathbf{B}\mathbf{C}^{-1})^k=\mathbf{C}\mathbf{B}^k\mathbf{C}^{-1}\) for all \(k \in \mathbb{N}\cup\{0\}.\) Therefore, we obtain \[\operatorname{char}_\mathbf{B}(\mathbf{C}\mathbf{B}\mathbf{C}^{-1})=\mathbf{C}\operatorname{char}_\mathbf{B}(\mathbf{B})\mathbf{C}^{-1}\] and hence – since \(\mathbf{C}\) is invertible – we have \[\mathbf{0}_n=\operatorname{char}_\mathbf{A}(\mathbf{A}) \qquad \iff \qquad \mathbf{0}_n=\operatorname{char}_\mathbf{B}(\mathbf{B}).\] It is thus sufficient to show that \(\operatorname{char}_\mathbf{B}(\mathbf{B})=\mathbf{0}_n.\) A Jordan block is an upper triangular matrix and hence a block diagonal matrix consisting of Jordan blocks is an upper triangular matrix as well. The Proposition 5.24 thus shows that \[\operatorname{char}_\mathbf{B}(x)=(x-\lambda_1)^{n_1}(x-\lambda_2)^{n_2}\cdots(x-\lambda_k)^{n_k}\] for all \(x\in \mathbb{C}\) and hence \[\operatorname{char}_\mathbf{B}(\mathbf{B})=(\mathbf{B}-\lambda_1\mathbf{1}_{n})^{n_1}(\mathbf{B}-\lambda_2\mathbf{1}_{n})^{n_2}\cdots(\mathbf{B}-\lambda_k\mathbf{1}_{n})^{n_k}.\] Since \(\mathbf{B}\mathbf{1}_{n}=\mathbf{1}_{n}\mathbf{B},\) we can rearrange factors in the expression for \(\operatorname{char}_\mathbf{B}(\mathbf{B})\) so that for each \(1\leqslant i\leqslant k,\) \[\begin{gathered} \operatorname{char}_\mathbf{B}(\mathbf{B})=(\mathbf{B}-\lambda_1\mathbf{1}_{n})^{n_1}\cdots (\mathbf{B}-\lambda_{n_i-1}\mathbf{1}_{n})^{n_i-1}(\mathbf{B}-\lambda_{n_i+1}\mathbf{1}_{n})^{n_i+1}\cdots\\ \cdots (\mathbf{B}-\lambda_k\mathbf{1}_{n})^{n_k}(\mathbf{B}-\lambda_i\mathbf{1}_{n})^{n_i}. \end{gathered}\] Now observe that \[\begin{gathered} \mathbf{B}-\lambda_i\mathbf{1}_{n}=\operatorname{diag}(\mathbf{J}_{n_1}(\lambda_1-\lambda_i),\ldots, \mathbf{J}_{n_i-1}(\lambda_{i-1}-\lambda_i),\mathbf{J}_{n_i}(0),\mathbf{J}_{n_i+1}(\lambda_{i+1}-\lambda_i),\ldots\\ \ldots,\mathbf{J}_{n_k}(\lambda_k-\lambda_i)). \end{gathered}\] By (12.5), we have \((\mathbf{J}_{n_i}(0))^{n_i}=\mathbf{0}_{n_i}\) and hence \[(\mathbf{B}-\lambda_i\mathbf{1}_{n})^{n_i}=\operatorname{diag}(\ldots,(\mathbf{J}_{n_i}(0))^{n_i},\ldots)=\operatorname{diag}(\ldots,\mathbf{0}_{n_i},\ldots).\] Therefore, the matrix \((\mathbf{B}-\lambda_i\mathbf{1}_{n})^{n_i}\) contains a zero block of size \(n_i\) after a diagonal block of size \(n_1+n_2+\cdots+n_{i-1}.\) This shows that \(\operatorname{char}_\mathbf{B}(\mathbf{B})\vec{e}_{j}=0_{\mathbb{C}^n}\) for \[n_1+n_2+\cdots+n_{i-1}<j \leqslant n_1+n_2+\cdots+n_{i-1}+n_{i}.\] Since \(\operatorname{char}_\mathbf{B}(\mathbf{B})\vec{e}_{j}\) equals the \(j\)-th column vector of \(\operatorname{char}_\mathbf{B}(\mathbf{B}),\) it follows that \(\operatorname{char}_\mathbf{B}(\mathbf{B})=\mathbf{0}_n.\)

Finally, for \(\mathbb{K}=\mathbb{R}\) (or in fact any subfield of \(\mathbb{C}\)) the claim follows by interpreting the entries of \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) as complex numbers.

12.5.2 A matrix is similar to its transpose

Let \(\lambda \in \mathbb{K}\) and \(n \in \mathbb{N}.\) Observe that the matrix representation of \(f_{\mathbf{J}_n(\lambda)} : \mathbb{K}^n \to \mathbb{K}^n\) with respect to the ordered basis \(\mathbf{b}^{\prime}=(\vec{e}_n,\vec{e}_{n-1},\ldots,\vec{e}_{2},\vec{e}_1)\) of \(\mathbb{K}^{n}\) satisfies \(\mathbf{M}(f_{\mathbf{J}_n(\lambda)},\mathbf{b}^{\prime},\mathbf{b}^{\prime})=(\mathbf{J}_n(\lambda))^T.\) This shows that a Jordan block is similar to its transpose, that is, \[(\mathbf{J}_n(\lambda))^T=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})\mathbf{J}_n(\lambda)\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})^{-1}\] by Theorem 3.107. Using the Jordan normal form, we obtain:

Corollary 12.29

Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{C}).\) Then \(\mathbf{A}\) and \(\mathbf{A}^T\) are similar, that is, there exists an invertible matrix \(\mathbf{X}\in M_{n,n}(\mathbb{C})\) such that \(\mathbf{A}^T=\mathbf{X}\mathbf{A}\mathbf{X}^{-1}.\)

Proof. By the Jordan normal form theorem there exists an integer \(\ell \in \mathbb{N},\) integers \(n_1,\ldots,n_{\ell}\) and complex numbers \(\lambda_1,\ldots,\lambda_\ell\) such that \(\mathbf{A}\) is similar to the matrix \[\mathbf{B}=\operatorname{diag}(\mathbf{J}_{n_1}(\lambda_1),\mathbf{J}_{n_2}(\lambda_2),\ldots,\mathbf{J}_{n_\ell}(\lambda_\ell)).\] That is, there exists an invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{C})\) such that \(\mathbf{A}=\mathbf{C}\mathbf{B}\mathbf{C}^{-1}.\) Each Jordan block is similar to its transpose, for \(1\leqslant i\leqslant \ell\) we can thus find invertible matrices \(\mathbf{Y}_i \in M_{n_i,n_i}(\mathbb{C})\) such that \[(\mathbf{J}_{n_i}(\lambda_i))^T=\mathbf{Y}_i\mathbf{J}_{n_i}(\lambda_i)\mathbf{Y}_i^{-1}.\] The invertible block diagonal matrix \(\mathbf{Y}=\operatorname{diag}(\mathbf{Y}_1,\ldots,\mathbf{Y}_\ell)\) thus satsfies \[\mathbf{Y}\mathbf{B}\mathbf{Y}^{-1}=\operatorname{diag}((\mathbf{J}_{n_1}(\lambda_1))^T,(\mathbf{J}_{n_2}(\lambda_1))^T,\ldots,(\mathbf{J}_{n_{\ell}}(\lambda_1))^T)=\mathbf{B}^T.\] Since \(\mathbf{A}=\mathbf{C}\mathbf{B}\mathbf{C}^{-1},\) we obtain \[\begin{aligned} \mathbf{A}^T&=(\mathbf{C}^{-1})^T\mathbf{B}^T\mathbf{C}^T=(\mathbf{C}^{-1})^T\mathbf{Y}\mathbf{B}\mathbf{Y}^{-1}\mathbf{C}^T=(\mathbf{C}^{-1})^T\mathbf{Y}\mathbf{C}^{-1}\mathbf{C}\mathbf{B}\mathbf{C}^{-1}\mathbf{C}\mathbf{Y}^{-1}\mathbf{C}^T\\ &=\mathbf{X}\mathbf{A}\mathbf{X}^{-1}, \end{aligned}\] where \(\mathbf{X}=(\mathbf{C}^{-1})^T\mathbf{Y}\mathbf{C}^{-1}.\)

12.4 Calculations

12.5 Applications

12.5.1 The Cayley–Hamilton theorem

12.5.2 A matrix is similar to its transpose

Home

Contents

Exercises

Lecture Recordings

Quizzes

Study Weeks

✕