12 The Jordan normal form
12.1 Generalised eigenvectors and eigenspaces
Let \(f : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Recall from Proposition 6.46 that the eigenspaces of \(f\) are in direct sum. Denoting by \(\lambda_1,\ldots,\lambda_k\) the eigenvalues of \(f,\) we have \[\tag{12.1} \operatorname{Eig}_f(\lambda_1)\oplus \operatorname{Eig}_f(\lambda_2)\oplus \cdots\oplus \operatorname{Eig}_f(\lambda_k)=V \quad \iff \quad f\;\text{is diagonalisable}.\] Not every endomorphism is diagonalisable, therefore the left hand side of (12.1) does not hold in general. We would like to remedy this by replacing each eigenspace in (12.1) with a suitable notion of generalised eigenspace. The idea is to consider “eigenvectors of higher rank”. For an endomorphism \(f : V \to V\) and \(k \in \mathbb{N},\) we write \[f^k=\underbrace{f\circ f \circ \cdots \circ f}_{k-\text{times}}\qquad \text{and define} \qquad f^0=\mathrm{Id}_V.\]
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) A non-zero vector \(v \in V\) is called a generalised eigenvector of \(f\) with eigenvalue \(\lambda \in \mathbb{K}\) if \[(f-\lambda\mathrm{Id}_V)^m(v)=0_V\] for some integer \(m\in \mathbb{N}.\) If a generalised eigenvector \(v\) satisfies \((f-\lambda\mathrm{Id}_V)^m(v)=0_V\) and \((f-\lambda\mathrm{Id}_V)^{m-1}(v)\neq 0_V,\) then \(v\) is said to have rank \(m\).
Notice that a generalised eigenvector of \(f : V \to V\) of rank \(1\) and with eigenvalue \(\lambda\) satisfies \[(f-\lambda\mathrm{Id}_V)(v)=0_V \qquad \text{and}\qquad \mathrm{Id}_V(v)\neq 0_V.\]Equivalently, \[f(v)=\lambda v \qquad \text{and}\qquad v \neq 0_V.\] Generalised eigenvectors of rank \(1\) are thus precisely the usual eigenvectors.
The good definition of a generalised eigenspace is a bit trickier.
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) For all \(\lambda \in \mathbb{K}\) we define the generalised \(\lambda\)-eigenspace of \(f\) to be the set \[\mathcal{E}_f(\lambda)=\bigcup_{k=0}^{\infty}\operatorname{Ker}((f-\lambda\mathrm{Id}_V)^k)\]
The previous definition, while convenient for proofs, is not particularly handy for computations. Observe however that if \(g : V \to V\) is a endomorphism of a \(\mathbb{K}\)-vector space \(V,\) then \[\{0_V\}=\operatorname{Ker}(g^0)\subset \operatorname{Ker}(g^1)\subset \operatorname{Ker}(g^2)\subset \operatorname{Ker}(g^3) \subset \cdots\] and correspondingly we have \[0\leqslant\dim \operatorname{Ker}(g) \leqslant \dim \operatorname{Ker}(g^2)\leqslant \dim \operatorname{Ker}(g^3) \leqslant \cdots\] If \(V\) is finite dimensional, then \(\dim \operatorname{Ker}((f-\lambda\mathrm{Id}_V)^k)\) can be at most \(\dim V\) for all \(k \in \mathbb{N}\) and therefore there exists an integer \(m \in \mathbb{N}\) so that the generalised \(\lambda\)-eigenspace of \(f\) satisfies \[\mathcal{E}_f(\lambda)=\operatorname{Ker}((f-\lambda\mathrm{Id}_V)^m).\]
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) Then \(\mathcal{E}_f(\lambda)\neq \{0_V\}\) if and only if \(\lambda\) is an eigenvalue of \(f.\)
Proof. If \(\lambda\) is an eigenvalue of \(f\) then there exists a non-zero vector \(v \in \operatorname{Ker}(f-\lambda\mathrm{Id}_V)\) and hence \(\dim \mathcal{E}_f(\lambda)>0\) so that \(\mathcal{E}_f(\lambda)\neq \{0_V\}.\) Conversely, suppose \(\mathcal{E}_f(\lambda)\neq \{0_V\}\) so that there exists an integer \(m\) and a non-zero vector \(v \in V\) such that \((f-\lambda\mathrm{Id}_V)^m(v)=0_V.\) We may assume \(m\) to be the smallest such integer. Then, by assumption, \(w=(f-\lambda\mathrm{Id}_V)^{m-1}(v)\neq 0_V\) and \(w\) satisfies \(f(w)=\lambda w\) and hence is an eigenvector of \(f\) with eigenvalue \(\lambda.\)
By a generalised eigenvector or generalised eigenspace of a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) we mean those of \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\)
Consider \[\mathbf{A}=\begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}\] The characteristic polynomial of \(\mathbf{A}\) is \(\operatorname{char}_\mathbf{A}(\lambda)=(\lambda-3)^2,\) hence we have a single eigenvalue \(3\) of algebraic multiplicity \(2.\) A simple calculation gives that \(\operatorname{Eig}_\mathbf{A}(3)=\operatorname{span}\{\vec{e}_1\}.\) Now \[(\mathbf{A}-3\cdot\mathbf{1}_{2})^2=\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}^2=\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix},\] hence \(\vec{e}_2\) satisfies \((\mathbf{A}-3\cdot\mathbf{1}_{2})^2\vec{e}_2=0_{\mathbb{K}^2}\) and \((\mathbf{A}-3\cdot\mathbf{1}_{2})\vec{e}_2\neq 0_{\mathbb{K}^2}.\) Therefore, \(\vec{e}_2\) is a generalised eigenvector of \(\mathbf{A}\) of rank \(2\) with eigenvalue \(3.\) We thus have \(\mathcal{E}_{\mathbf{A}}(3)=\operatorname{span}\{\vec{e}_1,\vec{e}_2\}.\)
Recall that an eigenspace of an endomorphism \(f : V \to V\) is a subspace of \(V\) that is stable under \(f.\) The same holds true for generalised eigenspaces.
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V\) and \(\lambda \in \mathbb{K}.\) Then \(\mathcal{E}_{f}(\lambda)\) is a subspace of \(V\) that is stable under \(f.\)
Proof. By definition, the zero vector \(0_V\) is an element of \(\mathcal{E}_f(\lambda),\) hence \(\mathcal{E}_{f}(\lambda)\) is non-empty. Let \(t_1,t_2 \in \mathbb{K}\) and \(v_1,v_2 \in \mathcal{E}_{f}(\lambda).\) Then there exist \(k_1,k_2\) such that \((f-\lambda\mathrm{Id}_V)^{k_1}(v_1)=0_V\) and \((f-\lambda\mathrm{Id}_V)^{k_2}(v_2)=0_V.\) Take \(k\) to be the maximum of \(\{k_1,k_2\}.\) Then, using the linearity of \(f-\lambda\mathrm{Id}_V\) and its powers, we compute \[\begin{aligned} 0_V&=t_1(f-\lambda\mathrm{Id}_V)^{k-k_1}(0_V)+t_2(f-\lambda\mathrm{Id}_V)^{k-k_2}(0_V)\\ &=t_1(f-\lambda\mathrm{Id}_V)^{k-k_1}((f-\lambda\mathrm{Id}_V)^{k_1}(v_1))+t_2(f-\lambda\mathrm{Id}_V)^{k-k_2}((f-\lambda\mathrm{Id}_V)^{k_2}(v_2))\\ &=t_1(f-\lambda\mathrm{Id}_V)^k(v_1)+t_2(f-\lambda\mathrm{Id}_V)^k(v_2)=(f-\lambda\mathrm{Id}_V)^k(t_1v_1+t_2v_2) \end{aligned}\] so that \(t_1v_1+t_2v_2 \in \operatorname{Ker}((f-\lambda\mathrm{Id}_V)^k) \subset \mathcal{E}_f(\lambda)\) and hence \(\mathcal{E}_f(\lambda)\) is a subspace by Definition 3.21.
We now show that \(\mathcal{E}_f(\lambda)\) is stable under \(f.\) Let \(v \in \mathcal{E}_f(\lambda)\) so that there exists \(k\geqslant 0\) with \((f-\lambda\mathrm{Id}_V)^k(v)=0_V.\) Write \(w=f(v).\) Then we obtain \[\begin{aligned} (f-\lambda\mathrm{Id}_V)^{k}(w)&=(f-\lambda\mathrm{Id}_V)^{k}(f(v)-\lambda v+\lambda v)\\ &=(f-\lambda\mathrm{Id}_V)^k(f(v)-\lambda v)+\lambda (f-\lambda\mathrm{Id}_V)^{k}(v)\\ &=(f-\lambda\mathrm{Id}_V)^{k+1}(v)+\lambda (f-\lambda\mathrm{Id}_V)^k(v)=0_V. \end{aligned}\] Therefore \(w=f(v) \in \mathcal{E}_f(\lambda)\) and hence \(\mathcal{E}_f(\lambda)\) is stable under \(f.\)
As for usual eigenspaces, generalised eigenspaces are also in direct sum:
Let \(f : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then the generalised eigenspaces of \(f\) are in direct sum.
Proof. Let \(\lambda_1,\ldots,\lambda_k\) be distinct eigenvalues of \(f\) and let \(n_i\) for \(1\leqslant i\leqslant k\) be such that \(\mathcal{E}_f(\lambda_i)=\operatorname{Ker}((f-\lambda_i\mathrm{Id}_V)^{n_i}).\) For \(1\leqslant i\leqslant k\) let \(v_i,\hat{v}_i \in \mathcal{E}_{f}(\lambda_i)\) be such that \[\tag{12.2} v_1+v_2+\cdots+v_k=\hat{v}_1+\hat{v}_2+\cdots+\hat{v}_k\] We want to show that \(w_i=v_i-\hat{v}_i=0_V\) for all \(1\leqslant i\leqslant k.\) For \(1\leqslant i\leqslant k\) consider the endomorphism \[g_i=(f-\lambda_1\mathrm{Id}_V)^{n_1}\circ \cdots\circ (f-\lambda_{i-1}\mathrm{Id}_V)^{n_{i-1}}\circ (f-\lambda_{i+1}\mathrm{Id}_V)^{n_{i+1}}\circ \cdots\circ (f-\lambda_k\mathrm{Id}_V)^{n_k}.\] Notice that \(g_i\) does not contain the mapping \((f-\lambda_i\mathrm{Id}_V)^{n_i}.\) For \(i \neq j\) the mapping \(g_i\) contains \((f-\lambda_j\mathrm{Id}_V)^{n_j}.\) Rearranging the mappings in \(g_i\) if necessary, we can assume that \(g_i=h\circ (f-\lambda_j\mathrm{Id}_V)^{n_j}\) for some endomorphism \(h.\) Rearranging does not change \(g_i\) since for all \(\mu_1,\mu_2 \in \mathbb{K}\) we have \[(f-\mu_1\mathrm{Id}_V)\circ (f-\mu_2\mathrm{Id}_V)=(f-\mu_2\mathrm{Id}_V)\circ (f-\mu_1\mathrm{Id}_V).\] Since \(w_j \in \mathcal{E}_{f}(\lambda_j)=\operatorname{Ker}((f-\lambda_j\mathrm{Id}_V)^{n_j})\) we thus conclude that \(g_i(w_j)=0_V.\)
By Lemma 12.6 the subspace \(\mathcal{E}_f(\lambda_i)\) is stable under \(f\) and hence it is also stable under \(f-\mu\mathrm{Id}_V\) for all \(\mu \in \mathbb{K}.\) This implies that \(\mathcal{E}_f(\lambda_i)\) is also stable under \(g_i.\) Write (12.2) as \[w_1+w_2+\cdots +w_k=0_V.\] Applying the endomorphism \(g_i\) to the previous equation and using that \(g_i(w_j)=0_V\) for \(i\neq j,\) we obtain that \(g_i(w_i)=0_V.\) Since for \(j\neq i\) none of the \(\lambda_j\) is a generalised eigenvalue of \(f|_{\mathcal{E}_f(\lambda_i)},\) the restriction of \(g_i\) to \(\mathcal{E}_f(\lambda_i)\) is invertible as an endomorphism of \(\mathcal{E}_f(\lambda_i).\) Since \(g_i(w_i)=0_V,\) this implies that \(w_i=0.\) Since \(i\) is arbitrary, we have \(w_1=w_2=\cdots=w_k=0_V,\) as desired.
We now obtain the desired improvement of (12.1) which holds true without the diagonalisability assumption of \(f.\)
Let \(f : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{C}\)-vector space \(V\) of dimension \(n\geqslant 1\) and let \(\lambda_1,\ldots,\lambda_k\) denote the distinct eigenvalues of \(f.\) Then we have \[\mathcal{E}_f(\lambda_1)\oplus \mathcal{E}_f(\lambda_2) \oplus \cdots \oplus \mathcal{E}_f(\lambda_k)=V.\]
Proof. Let \(U=\mathcal{E}_f(\lambda_1)\oplus \mathcal{E}_f(\lambda_2) \oplus \cdots \oplus \mathcal{E}_f(\lambda_k)\) and suppose that \(U\neq V.\) Then, by Corollary 6.11 there exists a complement \(U^{\prime}\) of \(U\) with \(\dim U^{\prime}\geqslant 1.\) Let \(\Pi : V \to U^{\prime}\) denote the projection onto \(U^{\prime}\) with kernel \(U\) and consider the endomorphism \(\hat{f}=\Pi \circ f|_{U^{\prime}} : U^{\prime} \to U^{\prime}.\) Since we work over the complex numbers and since \(\dim U^{\prime}\geqslant 1,\) Theorem 6.49 implies that \(\hat{f}\) admits an eigenvalue \(\mu.\) Let \(v \in U^{\prime}\) be a corresponding eigenvector of \(\hat{f}.\) Since \(U=\operatorname{Ker}\Pi\) is a complement of \(U^{\prime},\) we obtain \[f(v)=\mu v+u\] for some vector \(u \in U.\) We can write \(u=\sum_{i=1}^k u_i\) with \(u_i \in \mathcal{E}_{f}(\lambda_i).\) Now define \(g=f-\mu\mathrm{Id}_V : V \to V\) so that \[g(v)=\sum_{i=1}^k u_i.\] Suppose \(1\leqslant i\leqslant k\) is such that \(\lambda_i\neq \mu.\) By definition, \(\operatorname{Eig}_f(\lambda_i)\subset \mathcal{E}_f(\lambda_i),\) hence the restriction of \(g=f-\mu\mathrm{Id}_V\) to \(\mathcal{E}_f(\lambda_i)\) is invertible as an endomorphism of \(\mathcal{E}_f(\lambda_i),\) so there exists a vector \(v_i \in \mathcal{E}_f(\lambda_i)\) such that \(g(v_i)=u_i.\) If \(\lambda_i\neq \mu\) for all \(1\leqslant i\leqslant k,\) then we obtain \[g\left(v-\sum_{i=1}^kv_i\right)=0_V\] so that \(v-\sum_{i=1}^k v_i\) is an element of \(\operatorname{Ker}g=\operatorname{Ker}(f-\mu\mathrm{Id}_V)=\{0_V\},\) where the last equality follows since \(\mu\) is not an eigenvalue of \(f.\) We can therefore write \(v=\sum_{i=1}^k v_i \in U,\) but this contradicts the assumption that \(v \in U^{\prime}.\)
We conclude that we can find an integer \(i\) with \(1\leqslant i\leqslant k\) such that \(\lambda_i=\mu.\) After possibly renumbering the eigenvalues we can assume that \(\lambda_1=\mu\) and hence that \(\lambda_i\neq \mu\) for \(2\leqslant i\leqslant k,\) since the eigenvalues are distinct. So again for \(2\leqslant i\leqslant k\) we have vectors \(v_i \in \mathcal{E}_{f}(\lambda_i)\) such that \(g(v_i)=u_i.\) We thus have \[g\left(v-\sum_{i=2}^k v_i\right)=u_1.\] Since \(\mathcal{E}_f(\lambda_1)=\operatorname{Ker}((f-\lambda_1\mathrm{Id}_V)^{n_1})\) for some integer \(n_1\) and \(g=f-\lambda_1\mathrm{Id}_V,\) applying \(g^{n_1},\) we obtain \[g^{n_1+1}\left(v-\sum_{i=2}^k v_i\right)=g^{n_1}(u_1)=0_V,\] where the last equality uses that \(u_1 \in \mathcal{E}_f(\lambda_1).\) It follows that \(v-\sum_{i=2}^k v_i \in \mathcal{E}_f(\lambda_1)\) and hence that \(v \in U\) which is again a contradiction to the assumption that \(v \in U^{\prime}.\)
Each generalised eigenspace \(\mathcal{E}_f(\lambda_i)\) is stable under \(f.\) Therefore, if we fix an ordered basis \(\mathbf{b}_i\) of \(\mathcal{E}_f(\lambda_i),\) then we obtain matrices \(\mathbf{A}_i=\mathbf{M}(f|_{\mathcal{E}_{f}(\lambda_i)},\mathbf{b}_i,\mathbf{b}_i)\) and the matrix representation of \(f : V \to V\) with respect to the ordered basis \(\mathbf{b}\) of \(V\) obtained by joining the ordered bases \(\mathbf{b}_1,\ldots,\mathbf{b}_k\) takes the block diagonal form (where unprinted entries are understood to be zero) \[\begin{pmatrix} \mathbf{A}_1 & & & \\ & \mathbf{A}_2 & & \\ & & \ddots & \\ & & & \mathbf{A}_k \end{pmatrix}\] We write \(\operatorname{diag}(\mathbf{A}_1,\mathbf{A}_2,\ldots,\mathbf{A}_k)\) for such a block diagonal matrix.
Let \[\mathbf{A}_1=\begin{pmatrix} 1 & -3 \\ 4 & 8 \end{pmatrix}, \qquad \mathbf{A}_2=\begin{pmatrix} 2 \end{pmatrix},\qquad \mathbf{A}_3=\begin{pmatrix} 7 & -5 & 2 \\ 0 & 1 & -1 \\ 9 & 2 & 0 \end{pmatrix},\] then we have \[\operatorname{diag}(\mathbf{A}_1,\mathbf{A}_2,\mathbf{A}_3)=\begin{pmatrix} 1 & -3 & 0 & 0 & 0 & 0 \\ 4 & 8 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 7 & -5 & 2 \\ 0 & 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 9 & 2 & 0 \end{pmatrix}.\]