12 The Jordan normal form
12.1 Generalised eigenvectors and eigenspaces
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g :V \to V\) an endomorphism. Then the eigenspaces \(\operatorname{Eig}_{g}(\lambda)\) of \(g\) are in direct sum. In particular, if \(v_1,\ldots,v_m\) are eigenvectors corresponding to distinct eigenvalues of \(g,\) then \(\{v_1,\ldots,v_m\}\) are linearly independent.
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) A non-zero vector \(v \in V\) is called a generalised eigenvector of \(f\) with eigenvalue \(\lambda \in \mathbb{K}\) if \[(f-\lambda\mathrm{Id}_V)^m(v)=0_V\] for some integer \(m\in \mathbb{N}.\) If a generalised eigenvector \(v\) satisfies \((f-\lambda\mathrm{Id}_V)^m(v)=0_V\) and \((f-\lambda\mathrm{Id}_V)^{m-1}(v)\neq 0_V,\) then \(v\) is said to have rank \(m\).
Notice that a generalised eigenvector of \(f : V \to V\) of rank \(1\) and with eigenvalue \(\lambda\) satisfies \[(f-\lambda\mathrm{Id}_V)(v)=0_V \qquad \text{and}\qquad \mathrm{Id}_V(v)\neq 0_V.\]Equivalently, \[f(v)=\lambda v \qquad \text{and}\qquad v \neq 0_V.\] Generalised eigenvectors of rank \(1\) are thus precisely the usual eigenvectors.
The good definition of a generalised eigenspace is a bit trickier.
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) For all \(\lambda \in \mathbb{K}\) we define the generalised \(\lambda\)-eigenspace of \(f\) to be the set \[\mathcal{E}_f(\lambda)=\bigcup_{k=0}^{\infty}\operatorname{Ker}((f-\lambda\mathrm{Id}_V)^k)\]
The previous definition, while convenient for proofs, is not particularly handy for computations. Observe however that if \(g : V \to V\) is a endomorphism of a \(\mathbb{K}\)-vector space \(V,\) then \[\{0_V\}=\operatorname{Ker}(g^0)\subset \operatorname{Ker}(g^1)\subset \operatorname{Ker}(g^2)\subset \operatorname{Ker}(g^3) \subset \cdots\] and correspondingly we have \[0\leqslant\dim \operatorname{Ker}(g) \leqslant \dim \operatorname{Ker}(g^2)\leqslant \dim \operatorname{Ker}(g^3) \leqslant \cdots\] If \(V\) is finite dimensional, then \(\dim \operatorname{Ker}((f-\lambda\mathrm{Id}_V)^k)\) can be at most \(\dim V\) for all \(k \in \mathbb{N}\) and therefore there exists an integer \(m \in \mathbb{N}\) so that the generalised \(\lambda\)-eigenspace of \(f\) satisfies \[\mathcal{E}_f(\lambda)=\operatorname{Ker}((f-\lambda\mathrm{Id}_V)^m).\]
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\) Then \(\mathcal{E}_f(\lambda)\neq \{0_V\}\) if and only if \(\lambda\) is an eigenvalue of \(f.\)
Proof. If \(\lambda\) is an eigenvalue of \(f\) then there exists a non-zero vector \(v \in \operatorname{Ker}(f-\lambda\mathrm{Id}_V)\) and hence \(\dim \mathcal{E}_f(\lambda)>0\) so that \(\mathcal{E}_f(\lambda)\neq \{0_V\}.\) Conversely, suppose \(\mathcal{E}_f(\lambda)\neq \{0_V\}\) so that there exists an integer \(m\) and a non-zero vector \(v \in V\) such that \((f-\lambda\mathrm{Id}_V)^m(v)=0_V.\) We may assume \(m\) to be the smallest such integer. Then, by assumption, \(w=(f-\lambda\mathrm{Id}_V)^{m-1}(v)\neq 0_V\) and \(w\) satisfies \(f(w)=\lambda w\) and hence is an eigenvector of \(f\) with eigenvalue \(\lambda.\)
By a generalised eigenvector or generalised eigenspace of a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) we mean those of \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\)
Consider \[\mathbf{A}=\begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix}\] The characteristic polynomial of \(\mathbf{A}\) is \(\operatorname{char}_\mathbf{A}(\lambda)=(\lambda-3)^2,\) hence we have a single eigenvalue \(3\) of algebraic multiplicity \(2.\) A simple calculation gives that \(\operatorname{Eig}_\mathbf{A}(3)=\operatorname{span}\{\vec{e}_1\}.\) Now \[(\mathbf{A}-3\cdot\mathbf{1}_{2})^2=\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}^2=\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix},\] hence \(\vec{e}_2\) satisfies \((\mathbf{A}-3\cdot\mathbf{1}_{2})^2\vec{e}_2=0_{\mathbb{K}^2}\) and \((\mathbf{A}-3\cdot\mathbf{1}_{2})\vec{e}_2\neq 0_{\mathbb{K}^2}.\) Therefore, \(\vec{e}_2\) is a generalised eigenvector of \(\mathbf{A}\) of rank \(2\) with eigenvalue \(3.\) We thus have \(\mathcal{E}_{\mathbf{A}}(3)=\operatorname{span}\{\vec{e}_1,\vec{e}_2\}.\)
Recall that an eigenspace of an endomorphism \(f : V \to V\) is a subspace of \(V\) that is stable under \(f.\) The same holds true for generalised eigenspaces.
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V\) and \(\lambda \in \mathbb{K}.\) Then \(\mathcal{E}_{f}(\lambda)\) is a subspace of \(V\) that is stable under \(f.\)
Let \(V\) be a \(\mathbb{K}\)-vector space. A subset \(U\subset V\) is called a vector subspace of \(V\) if \(U\) is non-empty and if \[\tag{3.8} s_1\cdot_Vv_1+_Vs_2\cdot_V v_2 \in U\quad \text{for all}\; s_1,s_2 \in \mathbb{K}\; \text{and all}\; v_1,v_2 \in U.\]
We now show that \(\mathcal{E}_f(\lambda)\) is stable under \(f.\) Let \(v \in \mathcal{E}_f(\lambda)\) so that there exists \(k\geqslant 0\) with \((f-\lambda\mathrm{Id}_V)^k(v)=0_V.\) Write \(w=f(v).\) Then we obtain \[\begin{aligned} (f-\lambda\mathrm{Id}_V)^{k}(w)&=(f-\lambda\mathrm{Id}_V)^{k}(f(v)-\lambda v+\lambda v)\\ &=(f-\lambda\mathrm{Id}_V)^k(f(v)-\lambda v)+\lambda (f-\lambda\mathrm{Id}_V)^{k}(v)\\ &=(f-\lambda\mathrm{Id}_V)^{k+1}(v)+\lambda (f-\lambda\mathrm{Id}_V)^k(v)=0_V. \end{aligned}\] Therefore \(w=f(v) \in \mathcal{E}_f(\lambda)\) and hence \(\mathcal{E}_f(\lambda)\) is stable under \(f.\)
As for usual eigenspaces, generalised eigenspaces are also in direct sum:
Let \(f : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then the generalised eigenspaces of \(f\) are in direct sum.
Proof. Let \(\lambda_1,\ldots,\lambda_k\) be distinct eigenvalues of \(f\) and let \(n_i\) for \(1\leqslant i\leqslant k\) be such that \(\mathcal{E}_f(\lambda_i)=\operatorname{Ker}((f-\lambda_i\mathrm{Id}_V)^{n_i}).\) For \(1\leqslant i\leqslant k\) let \(v_i,\hat{v}_i \in \mathcal{E}_{f}(\lambda_i)\) be such that \[\tag{12.2} v_1+v_2+\cdots+v_k=\hat{v}_1+\hat{v}_2+\cdots+\hat{v}_k\] We want to show that \(w_i=v_i-\hat{v}_i=0_V\) for all \(1\leqslant i\leqslant k.\) For \(1\leqslant i\leqslant k\) consider the endomorphism \[g_i=(f-\lambda_1\mathrm{Id}_V)^{n_1}\circ \cdots\circ (f-\lambda_{i-1}\mathrm{Id}_V)^{n_{i-1}}\circ (f-\lambda_{i+1}\mathrm{Id}_V)^{n_{i+1}}\circ \cdots\circ (f-\lambda_k\mathrm{Id}_V)^{n_k}.\] Notice that \(g_i\) does not contain the mapping \((f-\lambda_i\mathrm{Id}_V)^{n_i}.\) For \(i \neq j\) the mapping \(g_i\) contains \((f-\lambda_j\mathrm{Id}_V)^{n_j}.\) Rearranging the mappings in \(g_i\) if necessary, we can assume that \(g_i=h\circ (f-\lambda_j\mathrm{Id}_V)^{n_j}\) for some endomorphism \(h.\) Rearranging does not change \(g_i\) since for all \(\mu_1,\mu_2 \in \mathbb{K}\) we have \[(f-\mu_1\mathrm{Id}_V)\circ (f-\mu_2\mathrm{Id}_V)=(f-\mu_2\mathrm{Id}_V)\circ (f-\mu_1\mathrm{Id}_V).\] Since \(w_j \in \mathcal{E}_{f}(\lambda_j)=\operatorname{Ker}((f-\lambda_j\mathrm{Id}_V)^{n_j})\) we thus conclude that \(g_i(w_j)=0_V.\)
Let \(f : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V\) and \(\lambda \in \mathbb{K}.\) Then \(\mathcal{E}_{f}(\lambda)\) is a subspace of \(V\) that is stable under \(f.\)
We now obtain the desired improvement of (12.1) which holds true without the diagonalisability assumption of \(f.\)
Let \(f : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{C}\)-vector space \(V\) of dimension \(n\geqslant 1\) and let \(\lambda_1,\ldots,\lambda_k\) denote the distinct eigenvalues of \(f.\) Then we have \[\mathcal{E}_f(\lambda_1)\oplus \mathcal{E}_f(\lambda_2) \oplus \cdots \oplus \mathcal{E}_f(\lambda_k)=V.\]
Let \(U\) be a subspace of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then there exists a subspace \(U^{\prime}\) so that \(V=U\oplus U^{\prime}.\)
Let \(g : V \to V\) be an endomorphism of a complex vector space \(V\) of dimension \(n\geqslant 1.\) Then \(g\) admits at least one eigenvalue. Moreover, the sum of the algebraic multiplicities of the eigenvalues of \(g\) is equal to \(n.\) In particular, if \(\mathbf{A}\in M_{n,n}(\mathbb{C})\) is a matrix, then there is at least one eigenvalue of \(\mathbf{A}.\)
We conclude that we can find an integer \(i\) with \(1\leqslant i\leqslant k\) such that \(\lambda_i=\mu.\) After possibly renumbering the eigenvalues we can assume that \(\lambda_1=\mu\) and hence that \(\lambda_i\neq \mu\) for \(2\leqslant i\leqslant k,\) since the eigenvalues are distinct. So again for \(2\leqslant i\leqslant k\) we have vectors \(v_i \in \mathcal{E}_{f}(\lambda_i)\) such that \(g(v_i)=u_i.\) We thus have \[g\left(v-\sum_{i=2}^k v_i\right)=u_1.\] Since \(\mathcal{E}_f(\lambda_1)=\operatorname{Ker}((f-\lambda_1\mathrm{Id}_V)^{n_1})\) for some integer \(n_1\) and \(g=f-\lambda_1\mathrm{Id}_V,\) applying \(g^{n_1},\) we obtain \[g^{n_1+1}\left(v-\sum_{i=2}^k v_i\right)=g^{n_1}(u_1)=0_V,\] where the last equality uses that \(u_1 \in \mathcal{E}_f(\lambda_1).\) It follows that \(v-\sum_{i=2}^k v_i \in \mathcal{E}_f(\lambda_1)\) and hence that \(v \in U\) which is again a contradiction to the assumption that \(v \in U^{\prime}.\)
Each generalised eigenspace \(\mathcal{E}_f(\lambda_i)\) is stable under \(f.\) Therefore, if we fix an ordered basis \(\mathbf{b}_i\) of \(\mathcal{E}_f(\lambda_i),\) then we obtain matrices \(\mathbf{A}_i=\mathbf{M}(f|_{\mathcal{E}_{f}(\lambda_i)},\mathbf{b}_i,\mathbf{b}_i)\) and the matrix representation of \(f : V \to V\) with respect to the ordered basis \(\mathbf{b}\) of \(V\) obtained by joining the ordered bases \(\mathbf{b}_1,\ldots,\mathbf{b}_k\) takes the block diagonal form (where unprinted entries are understood to be zero) \[\begin{pmatrix} \mathbf{A}_1 & & & \\ & \mathbf{A}_2 & & \\ & & \ddots & \\ & & & \mathbf{A}_k \end{pmatrix}\] We write \(\operatorname{diag}(\mathbf{A}_1,\mathbf{A}_2,\ldots,\mathbf{A}_k)\) for such a block diagonal matrix.
Let \[\mathbf{A}_1=\begin{pmatrix} 2 \end{pmatrix}, \qquad \mathbf{A}_2=\begin{pmatrix} 1 & -3 \\ 4 & 8 \end{pmatrix},\qquad \mathbf{A}_3=\begin{pmatrix} 7 & -5 & 2 \\ 0 & 1 & -1 \\ 9 & 2 & 0 \end{pmatrix},\] then we have \[\operatorname{diag}(\mathbf{A}_1,\mathbf{A}_2,\mathbf{A}_3)=\begin{pmatrix} 2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & -3 & 0 & 0 & 0 \\ 0 & 4 & 8 & 0 & 0 & 0 \\ 0 & 0 & 0 & 7 & -5 & 2 \\ 0 & 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 9 & 2 & 0 \end{pmatrix}.\]