11 Endomorphisms, I

In this chapter we study linear mappings from a vector space to itself.

Definition 11.1 • Endomorphism

A linear map \(g : V \to V\) from a \(\mathbb{K}\)-vector space \(V\) to itself is called an endomorphism. An endomorphism that is also an isomorphism is called an automorphism.

Working with endomorphisms has a different flavour than working with linear maps between two different spaces. In particular, if we want to write \(g\) as a matrix, it clearly makes sense to work with just one coordinate system for \(V\) – that is, we want to consider the matrices \(M(g, \mathbf{b}, \mathbf{b})\) for \(\mathbf{b}\) an ordered basis of \(V.\)

11.1 Matrices of endomorphisms

Similarity

Let \(V\) be a finite dimensional vector space equipped with an ordered basis \(\mathbf{b}\) and \(g : V \to V\) an endomorphism. As a special case of Theorem 8.26, we see that if we consider another ordered basis \(\mathbf{b}^{\prime}\) of \(V,\) then \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{b}^{\prime})=\mathbf{C}\,\mathbf{M}(g,\mathbf{b},\mathbf{b})\,\mathbf{C}^{-1},\] where we write \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})\) for the change of basis matrix. This motivates the following definition:

Definition 11.2 • Similar / conjugate matrices

Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{A}^{\prime} \in M_{n,n}(\mathbb{K}).\) The matrices \(\mathbf{A}\) and \(\mathbf{A}^{\prime}\) are called similar or conjugate over \(\mathbb{K}\) if there exists an invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K})\) such that \[\mathbf{A}^{\prime} =\mathbf{C}\mathbf{A}\mathbf{C}^{-1}.\]

Similarity of matrices over \(\mathbb{K}\) is an equivalence relation:

Proposition 11.3

Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B},\mathbf{X}\in M_{n,n}(\mathbb{K}).\) Then we have

  1. \(\mathbf{A}\) is similar to itself;

  2. \(\mathbf{A}\) is similar to \(\mathbf{B}\) then \(\mathbf{B}\) is similar to \(\mathbf{A}\);

  3. If \(\mathbf{A}\) is similar to \(\mathbf{B}\) and \(\mathbf{B}\) is similar to \(\mathbf{X},\) then \(\mathbf{A}\) is also similar to \(\mathbf{X}.\)

Proof. (i) We take \(\mathbf{C}=\mathbf{1}_{n}.\)

(ii) Suppose \(\mathbf{A}\) is similar to \(\mathbf{B}\) so that \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) for some invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) Multiplying with \(\mathbf{C}^{-1}\) from the left and \(\mathbf{C}\) from the right, we get \[\mathbf{C}^{-1}\mathbf{B}\mathbf{C}=\mathbf{C}^{-1}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{C}=\mathbf{A},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{C}^{-1}.\)

(iii) We have \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) and \(\mathbf{X}=\mathbf{D}\mathbf{B}\mathbf{D}^{-1}\) for invertible matrices \(\mathbf{C},\mathbf{D}.\) Then we get \[\mathbf{X}=\mathbf{D}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{D}^{-1},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{D}\mathbf{C}.\)

Remark 11.4

  • Because of (ii) in particular, one can say that two matrices \(\mathbf{A}\) and \(\mathbf{B}\) are similar without ambiguity.

  • Theorem 8.26 shows that \(\mathbf{A}\) and \(\mathbf{B}\) are similar if and only if there exists an endomorphism \(g\) of \(\mathbb{K}^n\) such that \(\mathbf{A}\) and \(\mathbf{B}\) represent \(g\) with respect to two ordered bases of \(\mathbb{K}^n.\)

The main goal of this chapter (and the next) is: given an endomorphism \(g,\) how can we choose a basis \(\mathbf{b}\) which makes the matrix of \(g\) as nice as possible? Equivalently, given a square matrix \(\mathbf{A},\) is there a “nicest” matrix among all the matrices similar to \(\mathbf{A}\)?

Remark 11.5

This should remind you a lot of row echelon form. The RREF of \(\mathbf{A}\) is a unique “best” representative among all the matrices which are row-equivalent to \(\mathbf{A}.\) We’re now looking for a unique “best” representative among all matrices similar to \(\mathbf{A}.\)

(This sort of classification problem – define some equivalence relation, and then look for a nicest representative of each equivalence class – comes up a great deal in many areas of mathematics.)

Invariants of similarity classes

As a first step, we want to study functions \(f : M_{n,n}(\mathbb{K})\to \mathbb{K}\) which are invariant under conjugation, that is, \(f\) satisfies \(f(\mathbf{C}\mathbf{A}\mathbf{C}^{-1})=f(\mathbf{A})\) for all \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) and all invertible matrices \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) We have already seen an example of such a function, namely the determinant. Indeed using the product rule Proposition 10.1 and Corollary 10.2, we compute \[\tag{11.1} \begin{aligned} \det \left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)&=\det(\mathbf{C}\mathbf{A})\det\left(\mathbf{C}^{-1}\right)=\det(\mathbf{C})\det(\mathbf{A})\det\left(\mathbf{C}^{-1}\right)\\ &=\det(\mathbf{A}). \end{aligned}\] Because of this fact, the following definition makes sense:

Definition 11.6 • Determinant of an endomorphism

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\det(g) = \det\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 8.26 and (11.1), the scalar \(\det(g)\) is independent of the chosen ordered basis.

Another example of a scalar that we can associate to an endomorphism is the so-called trace. Like for the determinant, we first define the trace for matrices. Luckily, the trace is a lot simpler to define:

Definition 11.7 • Trace of a matrix

Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\) The sum \(\sum_{i=1}^n [\mathbf{A}]_{ii}\) of its diagonal entries is called the trace of \(\mathbf{A}\) and denoted by \(\operatorname{Tr}(\mathbf{A})\) or \(\operatorname{Tr}\mathbf{A}.\)

Example 11.8

For all \(n \in \mathbb{N}\) we have \(\operatorname{Tr}(\mathbf{1}_{n})=n.\) For \[\mathbf{A}=\begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 3 \end{pmatrix}\] we have \(\operatorname{Tr}(\mathbf{A})=2+2+3=7.\)

The trace of a product of square matrices is independent of the order of multiplication:

Proposition 11.9

Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K}).\) Then we have \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]

Proof. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n}\) and \(\mathbf{B}=(B_{ij})_{1\leqslant i,j\leqslant n}.\) Then \[[\mathbf{A}\mathbf{B}]_{ij}=\sum_{k=1}^n A_{ik}B_{kj} \qquad \text{and}\qquad [\mathbf{B}\mathbf{A}]_{kj}=\sum_{i=1}^n B_{ki}A_{ij},\] so that \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\sum_{i=1}^n\sum_{k=1}^n A_{ik}B_{ki}=\sum_{k=1}^n\sum_{i=1}^n B_{ki}A_{ik}=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]

Using the previous proposition, we obtain \[\tag{11.2} \operatorname{Tr}\left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)=\operatorname{Tr}\left(\mathbf{A}\mathbf{C}^{-1}\mathbf{C}\right)=\operatorname{Tr}(\mathbf{A}).\] As for the determinant, the following definition thus makes sense:

Definition 11.10 • Trace of an endomorphism

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\operatorname{Tr}(g) = \operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 8.26 and (11.2), the scalar \(\operatorname{Tr}(g)\) is independent of the chosen ordered basis.

The trace and determinant of endomorphisms behave nicely with respect to composition of maps:

Proposition 11.11

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space. Then, for all endomorphisms \(f,g :V \to V\) we have

  1. \(\operatorname{Tr}(f\circ g)=\operatorname{Tr}(g\circ f)\);

  2. \(\det(f\circ g)=\det(f)\det(g).\)

Proof. (i) Fix an ordered basis \(\mathbf{b}\) of \(V.\) Then, using Corollary 8.19 and Proposition 11.9, we obtain \[\begin{aligned} \operatorname{Tr}(f\circ g)&=\operatorname{Tr}\left(\mathbf{M}(f\circ g,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(f,\mathbf{b},\mathbf{b})\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\\ &=\operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\mathbf{M}(f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(g\circ f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}(g\circ f). \end{aligned}\] The proof of (ii) is analogous, but we use Proposition 10.1 instead of Proposition 11.9.

We also have:

Proposition 11.12

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. Then the following statements are equivalent:

  1. \(g\) is injective;

  2. \(g\) is surjective;

  3. \(g\) is bijective;

  4. \(\det(g) \neq 0.\)

Proof. The equivalence of the first three statements follows from Corollary 6.22. We fix an ordered basis \(\mathbf{b}\) of \(V.\) Suppose \(g\) is bijective with inverse \(g^{-1} : V \to V.\) Then we have \[\det (g\circ g^{-1})=\det(g)\det\left(g^{-1}\right)=\det\left(\mathrm{Id}_V\right)=\det\left(\mathbf{M}(\mathrm{Id}_V,\mathbf{b},\mathbf{b})\right)=\det\left(\mathbf{1}_{\dim V}\right)=1.\] It follows that \(\det(g)\neq 0\) and moreover that \[\det\left(g^{-1}\right)=\frac{1}{\det g}.\] Conversely, suppose that \(\det g\neq 0.\) Then \(\det \mathbf{M}(g,\mathbf{b},\mathbf{b}) \neq 0\) so that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is invertible by Corollary 10.2 and Proposition 8.20 implies that \(g\) is bijective.

Remark 11.13

Notice that assertions (i)–(iii) of Proposition 11.12 are not equivalent for infinite-dimensional vector spaces (where the determinant doesn’t make sense). For instance, consider \(V=\mathbb{K}^{\infty},\) the \(\mathbb{K}\)-vector space of sequences from Example 4.7; then the endomorphism \(g : V \to V\) defined by \((x_1,x_2,x_3,\ldots) \mapsto (0,x_1,x_2,x_3,\ldots)\) is injective but not surjective.

11.2 Detour: More on Subspaces

Before we develop the theory of endomorphisms further, we need to make a detour, by developing a bit more theory about subspaces of vector spaces.

Sums of subspaces

Definition 11.14 • Sum of subspaces

Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The set \[\sum_{i=1}^n U_i=U_1+U_2+\cdots +U_n=\{v \in V | v=u_1+u_2+\cdots + u_n \text{ for } u_i \in U_i\}\] is called the sum of the subspaces \(U_i\).

Recall that by Proposition 4.22, the intersection of two subspaces is again a subspace, whereas the union of two subspaces fails to be a subspace in general. However, subspaces do behave nicely with regards to sums:

Proposition 11.15

The sum of the subspaces \(U_i\subset V,\) \(i=1\ldots,n\) is again a vector subspace.

Proof. The sum \(\sum_{i=1}^n U_i\) is non-empty, since it contains the zero vector \(0_V.\) Let \(v\) and \(v^{\prime} \in \sum_{i=1}^n U_i\) so that \[v=v_1+v_2+\cdots +v_n \qquad \text{and} \qquad v^{\prime}=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\] for vectors \(v_i,v^{\prime}_i \in U_i,\) \(i=1,\ldots,n.\) Each \(U_i\) is a vector subspace of \(V.\) Therefore, for all scalars \(s,t \in \mathbb{K},\) the vector \(s v_i+t v^{\prime}_i\) is an element of \(U_i,\) \(i=1,\ldots,n.\) Thus \[s v+t v^{\prime}=s v_1+t v^{\prime}_1+\cdots+s v_n+t v^{\prime}_n\] is an element of \(U_1+\cdots +U_n.\) By Definition 4.17, it follows that \(U_1+\cdots +U_n\) is a vector subspace of \(V.\)

Remark 11.16

Notice that \(U_1+\cdots+ U_n\) is the smallest vector subspace of \(V\) containing all vector subspaces \(U_i,\) \(i=1,\ldots,n.\)

Direct sums

If each vector in the sum is in a unique way the sum of vectors from the subspaces we say the subspaces are in direct sum:

Definition 11.17 • Direct sum of subspaces

Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The subspaces \(U_1,\ldots,U_n\) are said to be in direct sum if each vector \(w \in W=U_1+\cdots+U_n\) is in a unique way the sum of vectors \(v_i \in U_i\) for \(1\leqslant i\leqslant n.\) That is, if \(w=v_1+v_2+\cdots+v_n=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\) for vectors \(v_i,v^{\prime}_i \in U_i,\) then \(v_i=v^{\prime}_i\) for all \(1\leqslant i \leqslant n.\) We write \[\bigoplus_{i=1}^n U_i\] in case the subspaces \(U_1,\ldots,U_n\) are in direct sum.

Example 11.18

Let \(n \in \mathbb{N}\) and \(V=\mathbb{K}^n\) as well as \(U_i=\operatorname{span}\{\vec{e}_i\},\) where \(\{\vec{e}_1,\ldots,\vec{e}_n\}\) denotes the standard basis of \(\mathbb{K}^n.\) Then \(\mathbb{K}^n=\bigoplus_{i=1}^n U_i.\)

Remark 11.19

  1. Two subspaces \(U_1,U_2\) of \(V\) are in direct sum if and only if \(U_1\cap U_2=\{0_V\}.\) Indeed, suppose \(U_1\cap U_2=\{0_V\}\) and consider \(w=v_1+v_2=v_1^{\prime}+v_2^{\prime}\) with \(v_i,v_i^{\prime} \in U_i\) for \(i=1,2.\) We then have \(v_1-v_1^{\prime}=v_2^{\prime}-v_2 \in U_2,\) since \(U_2\) is a subspace. Since \(U_1\) is a subspace as well, we also have \(v_1-v_1^{\prime} \in U_1.\) Since \(v_1-v_1^{\prime}\) lies both in \(U_1\) and \(U_2,\) we must have \(v_1-v_1^{\prime}=0_V=v_2^{\prime}-v_2.\) Conversely, suppose \(U_1,U_2\) are in direct sum and let \(w \in (U_1\cap U_2).\) We can write \(w=w+0_V=0_V+w,\) since \(w \in U_1\) and \(w \in U_2.\) Since \(U_1,U_2\) are in direct sum, we must have \(w=0_V,\) hence \(U_1\cap U_2=\{0_V\}.\)

  2. Observe that if the subspaces \(U_1,\ldots,U_n\) are in direct sum and \(v_i \in U_i\) with \(v_i \neq 0_V\) for \(1\leqslant i\leqslant n,\) then the vectors \(\{v_1,\ldots,v_n\}\) are linearly independent. Indeed, if \(s_1,\ldots,s_n\) are scalars such that \[s_1v_1+s_2v_2+\cdots+s_n v_n=0_V=0_V+0_V+\cdots+0_V,\] then \(s_iv_i=0_V\) for all \(1\leqslant i\leqslant n.\) By assumption \(v_i\neq 0_V\) and hence \(s_i=0\) for all \(1\leqslant i\leqslant n.\)

Proposition 11.20

Let \(n \in \mathbb{N},\) \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,\ldots,U_n\) be subspaces of \(V.\) Let \(\mathbf{b}_i\) be an ordered basis of \(U_i\) for \(1\leqslant i\leqslant n.\) Then we have:

  1. The tuple of vectors obtained by listing all the vectors of the bases \(\mathbf{b}_i\) is a basis of \(V\) if and only if \(V=\bigoplus_{i=1}^n U_i.\)

  2. \(\dim(U_1+\cdots+U_n)\leqslant \dim(U_1)+\cdots+\dim (U_n)\) with equality if and only if the subspaces \(U_1,\ldots,U_n\) are in direct sum.

Proof. Part of an exercise.

Complements

Definition 11.21 • Complement to a subspace

Let \(V\) be a \(\mathbb{K}\)-vector space and \(U\subset V\) a subspace. A subspace \(U^{\prime}\) of \(V\) such that \(V=U\oplus U^{\prime}\) is called a complement to \(U\).

Example 11.22

Notice that a complement need not be unique. Consider \(V=\mathbb{R}^2\) and \(U=\operatorname{span}\{\vec{e}_1\}.\) Let \(v \in V.\) Then the subspace \(U^{\prime}=\operatorname{span}\{v\}\) is a complement to \(U,\) provided \(\vec{e}_1,\vec{v}\) are linearly independent.

Corollary 11.23 • Existence of a complement

Let \(U\) be a subspace of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then there exists a subspace \(U^{\prime}\) so that \(V=U\oplus U^{\prime}.\)

Proof. Suppose \((v_1,\ldots,v_m)\) is an ordered basis of \(U.\) By Theorem 5.10, there exists a basis \(\{v_1,\ldots,v_m,v_{m+1},\ldots,v_n\}\) of \(V.\) Defining \(U^{\prime}=\operatorname{span}\{v_{m+1},\ldots,v_n\},\) Proposition 11.20 implies the claim.

The dimension of a sum of two subspaces equals the sum of the dimensions of the subspaces minus the dimension of the intersection:

Proposition 11.24

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,U_2\) subspaces of \(V.\) Then we have \[\dim(U_1+U_2)=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\]

Proof. Let \(r=\dim(U_1\cap U_2)\) and let \(\{u_1,\ldots,u_r\}\) be a basis of \(U_1\cap U_2.\) These vectors are linearly independent and elements of \(U_1,\) hence by Theorem 5.10, there exist vectors \(v_1,\ldots,v_{m-r}\) so that \(\mathcal{S}_1=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r}\}\) is a basis of \(U_1.\) Likewise there exist vectors \(w_1,\ldots,w_{n-r}\) such that \(\mathcal{S}_2=\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) is a basis of \(U_2.\) Here \(m=\dim U_1\) and \(n=\dim U_2.\)

Now consider the set \(\mathcal{S}=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r},w_1,\ldots,w_{n-r}\}\) consisting of \(r+m-r+n-r=n+m-r\) vectors. If this set is a basis of \(U_1+U_2,\) then the claim follows, since then \(\dim(U_1+U_2)=n+m-r=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\)

We first show that \(\mathcal{S}\) generates \(U_1+U_2.\) Let \(y \in U_1+U_2\) so that \(y=x_1+x_2\) for vectors \(x_1 \in U_1\) and \(x_2 \in U_2.\) Since \(\mathcal{S}_1\) is a basis of \(U_1,\) we can write \(x_1\) as a linear combination of elements of \(\mathcal{S}_1.\) Likewise we can write \(x_2\) as a linear combination of elements of \(\mathcal{S}_2.\) It follows that \(\mathcal{S}\) generates \(U_1+U_2.\)

We need to show that \(\mathcal{S}\) is linearly independent. So suppose we have scalars \(s_1,\ldots,s_r,\) \(t_1,\ldots,t_{m-r},\) and \(r_{1},\ldots,r_{n-r},\) so that \[\underbrace{s_1u_1+\cdots+s_r u_r}_{=u} +\underbrace{t_1v_1+\cdots+t_{m-r}v_{m-r}}_{=v}+\underbrace{r_1w_1+\cdots+r_{n-r}w_{n-r}}_{=w}=0_V.\] Equivalently, \(w=-u-v\) so that \(w \in U_1.\) Since \(w\) is a linear combination of elements of \(\mathcal{S}_2,\) we also have \(w \in U_2.\) Therefore, \(w \in U_1\cap U_2\) and there exist scalars \(\hat{s}_1,\ldots,\hat{s}_r\) such that \[w=\underbrace{\hat{s}_1u_1+\cdots+\hat{s}_r u_r}_{=\hat{u}}\] This is equivalent to \(w-\hat{u}=0_V,\) or written out \[r_1w_1+\cdots+r_{n-r}w_{n-r}-\hat{s}_1u_1-\cdots+\hat{s}_r u_r=0_V.\] Since the vectors \(\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) are linearly independent, we conclude that \(r_1=\cdots=r_{n-r}=\hat{s}_1=\cdots=\hat{s}_r=0.\) It follows that \(w=0_V\) and hence \(u+v=0_V.\) Again, since \(\{u_1,\ldots,u_r,v_1,\ldots,v_{n-r}\}\) are linearly independent, we conclude that \(s_1=\cdots=s_r=t_1=\cdots=t_{m-r}=0\) and we are done.

Remark 11.25

If you’ve seen the Inclusion-Exclusion Principle for counting the sizes of finite sets, it’s tempting to guess that the previous lemma generalises to three or more subspaces as follows: \[\begin{gathered} \dim(U_1 + U_2 + U_3)\, \stackrel{?}{=}\, \dim(U_1) + \dim(U_2) + \dim(U_3) \\ - \dim(U_1\cap U_2) - \dim(U_2\cap U_3) - \dim(U_3 \cap U_1) + \dim(U_1 \cap U_2 \cap U_3). \end{gathered}\] This is, surprisingly, false – taking \(U_i\) to be any three distinct lines through the origin in \(\mathbb{R}^2\) gives a counterexample.

11.3 Eigenvectors and eigenvalues

Mappings \(g\) that have the same domain and codomain allow for the notion of a fixed point. Recall that an element \(x\) of a set \(\mathcal{X}\) is called a fixed point of a mapping \(g :\mathcal{X} \to \mathcal{X}\) if \(g(x)=x,\) that is, \(x\) agrees with its image under \(g.\) In Linear Algebra, a generalisation of the notion of a fixed point is that of an eigenvector. A vector \(v \in V\) is called an eigenvector of the linear map \(g : V \to V\) if \(v\) is merely scaled when applying \(g\) to \(v,\) that is, there exists a scalar \(\lambda \in \mathbb{K}\) – called eigenvalue – such that \(g(v)=\lambda v.\) Clearly, the zero vector \(0_V\) will satisfy this condition for every choice of scalar \(\lambda.\) For this reason, eigenvectors are usually required to be different from the zero vector. In this terminology, fixed points \(v\) of \(g\) are simply eigenvectors with eigenvalue \(1,\) since they satisfy \(g(v)=v=1v.\)

It is natural to ask whether a linear map \(g : V \to V\) always admits an eigenvector. In the remaining part of this chapter we will answer this question and further develop our theory of linear maps, specifically endomorphisms. We start with some precise definitions.

Definition 11.26 • Eigenvector, eigenspace, eigenvalue

Let \(g : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\)

  • An eigenvector with eigenvalue \(\lambda \in \mathbb{K}\) is a non-zero vector \(v \in V\) such that \(g(v)=\lambda v.\)

  • If \(\lambda\in \mathbb{K}\) is an eigenvalue of \(g,\) the \(\lambda\)-eigenspace \(\operatorname{Eig}_{g}(\lambda)\) is the subspace of vectors \(v\in V\) satisfying \(g(v)=\lambda v.\)

  • The dimension of \(\operatorname{Eig}_{g}(\lambda)\) is called the geometric multiplicity of the eigenvalue \(\lambda.\)

  • The set of all eigenvalues of \(g\) is called the spectrum of \(g\).

  • For \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) we speak of eigenvalues, eigenvectors, eigenspaces and spectrum to mean those of the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\)

Remark 11.27

By definition, the zero vector \(0_V\) is not an eigenvector, it is however an element of the eigenspace \(\operatorname{Eig}_{g}(\lambda)\) for every eigenvalue \(\lambda.\)

Example 11.28

  1. The scalar \(0\) is an eigenvalue of an endomorphism \(g : V \to V\) if and only if the kernel of \(g\) is different from \(\{0_V\}.\) In the case where the kernel of \(f\) does not only consist of the zero vector, we have \(\operatorname{Ker}g=\operatorname{Eig}_{g}(0)\) and the geometric multiplicity of \(0\) is the nullity of \(g.\)

  2. The endomorphism \(f_\mathbf{D}: \mathbb{K}^n \to \mathbb{K}^n\) associated to a diagonal matrix with distinct diagonal entries \[\mathbf{D}=\begin{pmatrix} \lambda_1 & && \\ & \lambda_2 && \\ &&\ddots & \\ &&& \lambda_n \end{pmatrix}\] has spectrum \(\{\lambda_1,\ldots,\lambda_n\}\) and corresponding eigenspaces \(\operatorname{Eig}_{f_{\mathbf{D}}}(\lambda_i)=\operatorname{span}\{\vec{e}_i\}.\)

  3. Consider the \(\mathbb{R}\)-vector space \(\mathsf{P}(\mathbb{R})\) of polynomials and \(f=\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) the derivative by the variable \(x.\) The kernel of \(f\) consists of the constant polynomials and hence \(0\) is an eigenvalue for \(f.\) For any non-zero scalar \(\lambda\) we cannot have polynomials \(p\) satisfying \(\frac{\mathrm{d}}{\mathrm{d}x} p=\lambda p,\) as the left hand of this last expression has a smaller degree than the right hand side.

Previously we defined the trace and determinant for an endomorphism \(g : V \to V\) by observing that the trace and determinant of the matrix representation of \(g\) are independent of the chosen basis of \(V.\) Similarly, we can consider eigenvalues of \(g\) and eigenvalues of the matrix representation of \(g\) with respect to some ordered basis of \(V.\) Perhaps unsurprisingly, the eigenvalues are the same:

Proposition 11.29

Let \(g : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Let \(\mathbf{b}\) be an ordered basis of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}.\) Then \(v \in V\) is an eigenvector of \(g\) with eigenvalue \(\lambda \in \mathbb{K}\) if and only if \(\boldsymbol{\beta}(v) \in \mathbb{K}^n\) is an eigenvector with eigenvalue \(\lambda\) of \(\mathbf{M}(g,\mathbf{b},\mathbf{b}).\) In particular, conjugate matrices have the same eigenvalues.

Proof. Write \(\mathbf{A}=\mathbf{M}(g,\mathbf{b},\mathbf{b}).\) Recall that by an eigenvector of \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) we mean an eigenvector of \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\) By Definition 8.10, we have \(f_\mathbf{A}=\boldsymbol{\beta}\circ g \circ \boldsymbol{\beta}^{-1}.\) Suppose \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) so that \(g(v)=\lambda v\) for some non-zero vector \(v\in V.\) Consider the vector \(\vec{x}=\boldsymbol{\beta}(v) \in \mathbb{K}^n\) which is non-zero, since \(\boldsymbol{\beta}: V \to \mathbb{K}^n\) is an isomorphism. Then \[f_\mathbf{A}(\vec{x})=\boldsymbol{\beta}(g(\boldsymbol{\beta}^{-1}(\vec{x})))=\boldsymbol{\beta}(g(v))=\boldsymbol{\beta}(\lambda v)=\lambda \boldsymbol{\beta}(v)=\lambda \vec{x},\] so that \(\vec{x}\) is an eigenvector of \(f_\mathbf{A}\) with eigenvalue \(\lambda.\)

Conversely, if \(\lambda\) is an eigenvalue of \(f_\mathbf{A}\) with non-zero eigenvector \(\vec{x},\) then it follows as above that \(v=\boldsymbol{\beta}^{-1}(\vec{x}) \in V\) is an eigenvector of \(g\) with eigenvalue \(\lambda.\)

By Remark 11.4, if the matrices \(\mathbf{A},\) \(\mathbf{B}\) are similar, then they represent the same endomorphism \(g : \mathbb{K}^n \to \mathbb{K}^n\) and hence have the same eigenvalues.

The “nicest” endomorphisms are those for which there exists an ordered basis consisting of eigenvectors:

Definition 11.30 • Diagonalisable endomorphism

  • An endomorphism \(g : V \to V\) is called diagonalisable if there exists an ordered basis \(\mathbf{b}\) of \(V\) such that each element of \(\mathbf{b}\) is an eigenvector of \(g.\)

  • For \(n \in \mathbb{N},\) a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is called diagonalisable over \(\mathbb{K}\) if the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n\) is diagonalisable.

Example 11.31

  1. We consider \(V=\mathsf{P}(\mathbb{R})\) and the endomorphism \(g : V \to V\) which replaces the variable \(x\) with \(2x.\) For instance, we have \[g(x^2-2x+3)=(2x)^2-2(2x)+3=4x^2-4x+3.\] Then \(g\) is diagonalisable. The vector space \(\mathsf{P}(\mathbb{R})\) has an ordered basis \(\mathbf{b}=(1,x,x^2,x^3,\ldots).\) Clearly, for all \(k \in \mathbb{N}\cup\{0\}\) we have \(g(x^k)=2^kx^k,\) so that \(x^k\) is an eigenvector of \(g\) with eigenvalue \(2^k.\)

  2. For \(\alpha \in (0, \pi)\) consider \[\mathbf{R}_{\alpha}=\begin{pmatrix} \cos \alpha & -\sin \alpha \\ \sin\alpha & \cos \alpha \end{pmatrix}.\] Recall that the endomorphism \(f_{\mathbf{R}_{\alpha}} : \mathbb{R}^2 \to \mathbb{R}^2\) rotates vectors counter-clockwise around the origin \(0_{\mathbb{R}^2}\) by the angle \(\alpha.\) Since \(\alpha \in (0,\pi),\) the endomorphism \(f_{\mathbf{R}_{\alpha}}\) has no eigenvectors and hence is not diagonalisable.

Remark 11.32

Applying Proposition 11.29, we conclude that in the case of a finite dimensional \(\mathbb{K}\)-vector space \(V,\) an endomorphism \(g : V \to V\) is diagonalisable if and only if there exists an ordered basis \(\mathbf{b}\) of \(V\) such that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is a diagonal matrix. Moreover, \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is diagonalisable if and only if \(\mathbf{A}\) is similar over \(\mathbb{K}\) to a diagonal matrix.

Recall, if \(\mathcal{X},\mathcal{Y}\) are sets, \(f : \mathcal{X} \to \mathcal{Y}\) a mapping and \(\mathcal{Z}\subset \mathcal{X}\) a subset of \(\mathcal{X},\) we can consider the restriction of \(f\) to \(\mathcal{Z}\), usually denoted by \(f|_\mathcal{Z},\) which is the mapping \[f|_\mathcal{Z} : \mathcal{Z} \to \mathcal{Y}, \quad z \mapsto f(z).\] So we simply take the same mapping \(f,\) but apply it to the elements of the subset only.

Closely related to the notion of an eigenvector is that of a stable subspace. Let \(v \in V\) be an eigenvector with eigenvalue \(\lambda\) of the endomorphism \(g : V \to V.\) The \(1\)-dimensional subspace \(U=\operatorname{span}\{v\}\) is stable under \(g,\) that is, \(g(U)\subset U.\) Indeed, since \(g(v)=\lambda v\) and since every vector \(u \in U\) can be written as \(u=t v\) for some scalar \(t \in \mathbb{K},\) we have \(g(u)=g(t v)=t g(v)=t\lambda v \in U.\) This motivates the following definition:

Definition 11.33 • Stable subspace

A subspace \(U\subset V\) is called stable or invariant under the endomorphism \(g : V \to V\) if \(g(U) \subset U,\) that is \(g(u) \in U\) for all vectors \(u \in U.\) In this case, the restriction \(g|_{U}\) of \(g\) to \(U\) is an endomorphism of \(U.\)

Remark 11.34

Notice that a finite dimensional subspace \(U\subset V\) is stable under \(g\) if and only if \(g(v_i) \in U\) for \(1\leqslant i\leqslant m,\) where \(\{v_1,\ldots,v_m\}\) is a basis of \(U.\)

Example 11.35

  1. Every eigenspace of an endomorphism \(g : V \to V\) is a stable subspace. By definition \(g|_{\operatorname{Eig}_{g}(\lambda)} : \operatorname{Eig}_{g}(\lambda) \to \operatorname{Eig}_{g}(\lambda)\) is multiplication by the scalar \(\lambda \in \mathbb{K}.\)

  2. We consider \(V=\mathbb{R}^3\) and \[\mathbf{R}_{\alpha}=\begin{pmatrix} \cos \alpha & -\sin \alpha & 0 \\ \sin \alpha & \cos \alpha & 0 \\ 0 & 0 & 1 \end{pmatrix}\] for \(\alpha \in (0,\pi).\) The endomorphism \(f_{\mathbf{R}_\alpha} : \mathbb{R}^3 \to \mathbb{R}^3\) is the rotation by the angle \(\alpha \in \mathbb{R}\) around the axis spanned by \(\vec{e}_3.\) Then the plane \(U=\{\vec{x}=(x_i)_{1\leqslant i\leqslant 3} \in \mathbb{R}^3 | x_3=0\}\) is stable under \(f=f_{\mathbf{R}_{\alpha}}.\) Here \(f|_{\Pi} : \Pi \to \Pi\) is the rotation in the plane \(U\) around the origin with angle \(\alpha.\)

    Moreover, the vector \(\vec{e}_3\) is an eigenvector with eigenvalue \(1\) so that \[\operatorname{Eig}_{f}(1)=\operatorname{span}\{\vec{e}_3\}.\]

  3. We consider again the \(\mathbb{R}\)-vector space \(\mathsf{P}(\mathbb{R})\) of polynomials and \(f=\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) the derivative by the variable \(x.\) For \(n \in \mathbb{N}\) let \(U_n\) denote the subspace of polynomials of degree at most \(n.\) Since \(U_{n-1}\subset U_n,\) the subspace \(U_n\) is stable under \(f.\)

Stable subspaces correspond to zero blocks in the matrix representation of linear maps. More precisely:

Proposition 11.36

Let \(V\) be a \(\mathbb{K}\)-vector space of dimension \(n \in \mathbb{N}\) and \(g : V \to V\) an endomorphism. Furthermore, let \(U\subset V\) be a subspace of dimension \(1\leqslant m\leqslant n\) and \(\mathbf{b}\) an ordered basis of \(U\) and \(\mathbf{c}=(\mathbf{b},\mathbf{b}^{\prime})\) an ordered basis of \(V.\) Then \(U\) is stable under \(g\) if and only if the matrix \(\mathbf{A}=\mathbf{M}(g,\mathbf{c},\mathbf{c})\) has the form \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}\] for some matrix \(\hat{\mathbf{A}} \in M_{m,m}(\mathbb{K}).\) In the case where \(U\) is stable under \(g,\) we have \(\hat{\mathbf{A}}=\mathbf{M}(g|_U,\mathbf{b},\mathbf{b}) \in M_{m,m}(\mathbb{K}).\)

Proof. Write \(\mathbf{b}=(v_1,\ldots,v_m)\) for vectors \(v_i \in U\) and \(\mathbf{b}^{\prime}=(w_1,\ldots,w_{n-m})\) for vectors \(w_i \in V.\)

\(\Rightarrow\) Since \(U\) is stable under \(g,\) we have \(g(u) \in U\) for all vectors \(u \in U.\) Since \(\mathbf{b}\) is a basis of \(U,\) there exist scalars \(\hat{A}_{ij} \in \mathbb{K}\) with \(1\leqslant i,j\leqslant m\) such that \[g(v_j)=\sum_{i=1}^m \hat{A}_{ij}v_i\] for all \(1\leqslant j\leqslant m.\) By Proposition 8.11, the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}=(\mathbf{b},\mathbf{b}^{\prime})\) of \(V\) thus takes the form \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}\] where we write \(\hat{\mathbf{A}}=(\hat{A}_{ij})_{1\leqslant i,j\leqslant m}=\mathbf{M}(g|_U,\mathbf{b},\mathbf{b}).\)

\(\Leftarrow\) Suppose \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}=\mathbf{M}(g,\mathbf{c},\mathbf{c})\] is the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}\) of \(V.\) Write \(\hat{\mathbf{A}}=(\hat{A}_{ij})_{1\leqslant i,j\leqslant m}\) Then, by Proposition 8.11, \(g(v_j)=\sum_{i=1}^m \hat{A}_{ij}v_i \in U\) for all \(1\leqslant j\leqslant m,\) hence \(U\) is stable under \(g,\) by Remark 11.34.

From Proposition 11.36 we can conclude:

Remark 11.37

Suppose \(V\) is the direct sum of subspaces \(U_1,\) \(U_2,\ldots,U_m,\) all of which are stable under the endomorphism \(g : V \to V.\) If \(\mathbf{b}_i\) is an ordered basis of \(U_i\) for \(i=1,\ldots,m,\) then the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}=(\mathbf{b}_1,\ldots,\mathbf{b}_m)\) takes the block form \[\mathbf{A}=\begin{pmatrix} \mathbf{A}_1 & && \\ & \mathbf{A}_2 && \\ &&\ddots & \\ &&& \mathbf{A}_m \end{pmatrix}\] where \(\mathbf{A}_i=\mathbf{M}(g|_{U_i},\mathbf{b}_i,\mathbf{b}_i)\) for \(i=1,\ldots,m.\)

11.4 The characteristic polynomial

The eigenvalues of an endomorphism are the solutions of a polynomial equation:

Lemma 11.38

Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. Then \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if \[\det\left(\lambda \mathrm{Id}_{V}-g\right)=0.\] Moreover if \(\lambda\) is an eigenvalue of \(g,\) then \(\operatorname{Eig}_{g}(\lambda)=\operatorname{Ker}(\lambda \mathrm{Id}_V-g).\)

Proof. Let \(v \in V.\) We may write \(v=\mathrm{Id}_V(v).\) Hence \[g(v)=\lambda v \quad \iff \quad 0_V=(\lambda \mathrm{Id}_V-g)(v) \quad \iff v \in \operatorname{Ker}(\lambda\mathrm{Id}_V-g)\] It follows that \(\operatorname{Eig}_{g}(\lambda)=\operatorname{Ker}(\lambda \mathrm{Id}_V-g).\) Moreover \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if the kernel of \(\lambda\mathrm{Id}_V-g\) is different from \(\{0_V\}\) or if and only if \(\lambda\mathrm{Id}_V-g\) is not injective. Proposition 11.12 implies that \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if \(\det\left(\lambda \mathrm{Id}_{V}-g\right)=0.\)

Definition 11.39 • Characteristic polynomial

Let \(g : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) The function \[\operatorname{char}_g : \mathbb{K}\to \mathbb{K}, \quad x\mapsto \det\left(x\mathrm{Id}_V -g\right)\] is called the characteristic polynomial of the endomorphism \(g\).

In practice, in order to compute the characteristic polynomial of an endomorphism \(g : V \to V,\) we choose an ordered basis \(\mathbf{b}\) of \(V\) and compute the matrix representation \(\mathbf{A}=\mathbf{M}(g,\mathbf{b},\mathbf{b})\) of \(g\) with respect to \(\mathbf{b}.\) We then have \[\operatorname{char}_g(x)=\det\left(x\mathbf{1}_{n}-\mathbf{A}\right).\] By the characteristic polynomial of a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) we mean the characteristic polynomial of the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n,\) that is, the function \(x\mapsto \det\left(x\mathbf{1}_{n}-\mathbf{A}\right).\)

A zero of a polynomial \(f : \mathbb{K}\to \mathbb{K}\) is a scalar \(\lambda\in \mathbb{K}\) such that \(f(\lambda)=0.\) The multiplicity of a zero \(\lambda\) is the largest integer \(n\geqslant 1\) such that there exists a polynomial \(\hat{f} : \mathbb{K}\to \mathbb{K}\) so that \(f(x)=(x-\lambda)^n\hat{f}(x)\) for all \(x \in \mathbb{K}.\) Zeros are also known as roots.

Example 11.40

The polynomial \(f(x)=x^3-x^2-8x+12\) can be factorised as \(f(x)=(x-2)^2(x+3)\) and hence has zero \(2\) with multiplicity \(2\) and \(-3\) with multiplicity \(1.\)

Definition 11.41 • Algebraic multiplicity

Let \(\lambda\) be an eigenvalue of the endomorphism \(g : V \to V.\) The multiplicity of the zero \(\lambda\) of \(\operatorname{char}_g\) is called the algebraic multiplicity of \(\lambda\).

Example 11.42

  1. We consider \[\mathbf{A}=\begin{pmatrix} 1 & 5 \\ 5 & 1 \end{pmatrix}.\] Then \[\begin{aligned} \operatorname{char}_{\mathbf{A}}(x)&=\operatorname{char}_{f_\mathbf{A}}(x)=\det\left(x\mathbf{1}_{2}-A\right)=\det\begin{pmatrix} x-1 & -5 \\ -5 & x-1\end{pmatrix}\\ &=(x-1)^2-25=x^2-2x-24=(x+4)(x-6). \end{aligned}\] Hence we have eigenvalues \(\lambda_1=6\) and \(\lambda_2=-4,\) both with algebraic multiplicity \(1.\) By definition we have \[\operatorname{Eig}_{\mathbf{A}}(6)=\operatorname{Eig}_{f_\mathbf{A}}(6)=\left\{\vec{v} \in \mathbb{K}^2 | \mathbf{A}\vec{v}=6\vec{v}\right\}\] and we compute that \[\operatorname{Eig}_{\mathbf{A}}(6)=\operatorname{span}\left\{\begin{pmatrix} 1 \\ 1 \end{pmatrix}\right\}\] Since \(\dim \operatorname{Eig}_\mathbf{A}(6)=1,\) the eigenvalue \(6\) has geometric multiplicity \(1.\) Likewise we compute \[\operatorname{Eig}_\mathbf{A}(-4)=\operatorname{span}\left\{\begin{pmatrix} -1 \\ 1 \end{pmatrix}\right\}\] so that the eigenvalue \(-4\) has geometric multiplicity \(1\) as well. Notice that we have an ordered basis of eigenvectors of \(\mathbf{A}\) and hence \(\mathbf{A}\) is diagonalisable, c.f. Example 8.15.

  2. We consider \[\mathbf{A}=\begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}\] Then \(\operatorname{char}_\mathbf{A}(x)=(x-2)^2\) so that we have a single eigenvalue \(2\) with algebraic multiplicity \(2.\) We compute \[\operatorname{Eig}_\mathbf{A}(2)=\operatorname{span}\left\{\begin{pmatrix} 1 \\ 0 \end{pmatrix}\right\}\] so that the eigenvalue \(2\) has geometric multiplicity \(1.\) Notice that we cannot find an ordered basis consisting of eigenvectors, hence \(\mathbf{A}\) is not diagonalisable.

The determinant and trace of an endomorphism do appear as coefficients in its characteristic polynomial:

Lemma 11.43

Let \(g : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V\) of dimension \(n.\) Then \(\operatorname{char}_g\) is a polynomial of degree \(n\) and \[\operatorname{char}_g(x)=x^n-\operatorname{Tr}(g)x^{n-1}+\cdots +(-1)^n\det(g).\]

Proof. We fix an ordered basis \(\mathbf{b}\) of \(V.\) Writing \(\mathbf{M}(g,\mathbf{b},\mathbf{b})=\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n}\) and using the Leibniz formula (10.3), we have \[\operatorname{char}_g(x)=\sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^nB_{i\sigma(i)},\] where \[B_{ij}=\left\{\begin{array}{cc} x-A_{ii}, & i=j,\\ -A_{ij}, & i\neq j.\end{array}\right.\] Therefore, \(\operatorname{char}_g\) is a finite sum of products containing \(x\) at most \(n\) times, hence \(\operatorname{char}_g\) is a polynomial in \(x\) of degree at most \(n.\) The identity permutation contributes the term \(\prod_{i=1}^nB_{ii}\) in the Leibniz formula, hence we obtain \[\operatorname{char}_g(x)=\prod_{i=1}^n(x-A_{ii})+\sum_{\sigma \in S_n,\sigma \neq 1}\operatorname{sgn}(\sigma)\prod_{i=1}^nB_{i\sigma(i)}\] We now use induction to show that \[\prod_{i=1}^n(x-A_{ii})=x^n-\operatorname{Tr}(\mathbf{A})x^{n-1}+c_{n-2}x^{n-2}+\cdots +c_1x+c_0\] for scalars \(c_{n-2},\ldots,c_0 \in \mathbb{K}.\) For \(n=1\) the claim is simply \(x - A_{11} = x - A_{11},\) so the base case of the induction is OK.

Inductive step: Suppose \[\prod_{i=1}^{n-1}(x-A_{ii})=x^{n-1}-\left(\sum_{i=1}^{n-1} A_{ii}\right)x^{n-2}+c_{n-3}x^{n-3}+\cdots +c_1x+c_0,\] for coefficients \(c_{n-3},\ldots,c_0,\) then \[\begin{aligned} \prod_{i=1}^{n}(x-A_{ii})&=(x-A_{nn})\left[x^{n-1}-\left(\sum_{i=1}^{n-1} A_{ii}\right)x^{n-2}+c_{n-3}x^{n-3}+\cdots +c_1x+c_0\right]\\ &=x^{n}-\left(\sum_{i=1}^{n}A_{ii}\right)x^{n-1}+\text{ lower order terms in }x, \end{aligned}\] so the induction is complete.

We next argue that \(\sum_{\sigma \in S_n,\sigma \neq 1}\operatorname{sgn}(\sigma)\prod_{i=1}^nB_{i\sigma(i)}\) has at most degree \(n-2.\) Notice that each factor \(B_{i\sigma(i)}\) of \(\prod_{i=1}^nB_{i\sigma(i)}\) for which \(i\neq \sigma(i)\) does not contain \(x.\) So suppose that \(\sum_{\sigma \in S_n,\sigma \neq 1}\operatorname{sgn}(\sigma)\prod_{i=1}^nB_{i\sigma(i)}\) has degree bigger or equal than \(n-1.\) Then we have \(n-1\) integers \(i\) with \(1\leqslant i\leqslant n\) such that \(i=\sigma(i).\) Let \(j\) denote the remaining integer. Since \(\sigma\) is injective, it follows that for any \(i\neq j\) we must have \(i=\sigma(i)\neq \sigma(j).\) Therefore, \(\sigma(j)=j\) and hence \(\sigma=1,\) a contradiction.

In summary, we have shown that \[\operatorname{char}_g(x)=x^n-\operatorname{Tr}(g)x^{n-1}+c_{n-2}x^{n-2}+\cdots+c_1x+c_0\] for coefficients \(c_{n-2},\ldots,c_0 \in \mathbb{K}.\) It remains to show that \(c_0=(-1)^n\det(g).\) We have \(c_0=\operatorname{char}_g(0)=\det(-g)=\det(-\mathbf{A}).\) Since the determinant is linear in each row of \(\mathbf{A},\) this gives \(\det(-\mathbf{A})=(-1)^n\det(\mathbf{A}),\) as claimed.

Remark 11.44

In particular, for \(n=2\) we have \(\operatorname{char}_g(x)=x^2-\operatorname{Tr}(g)x+\det(g).\) Compare with Example 11.42.


Exercises

Exercise 11.45

Let \(U\) and \(U'\) be the subspaces of \(\mathbb{R}^3\) with bases \[\left\{ \begin{pmatrix} 1 \\ 1 \\ 1\end{pmatrix}, \begin{pmatrix}1 \\ 2 \\2 \end{pmatrix}\right\} \quad\text{and}\quad \left\{ \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\right\}\] respectively. Show that \(U\) and \(U'\) are complementary subspaces of \(\mathbb{R}^3.\)

Solution

Since both basis vectors of \(U\) have their 2nd and 3rd entries equal, this is true of all vectors in \(U.\) Hence \(U'\) is not contained in \(U\); and as \(U'\) is one-dimensional, this implies that \(U \cap U'\) must be zero (so the sum is direct); and \(\dim (U + U') = \dim(U) + \dim(U') = 3\) (so the sum is all of \(\mathbb{R}^3\)). Hence \(\mathbb{R}^3 = U \oplus U'\) as required.

Exercise 11.46

Compute the characteristic polynomials and eigenvalues of the following matrices, and bases of their eigenspaces: \[(i)\ \begin{pmatrix}7 & -4 \\ -8 & -7 \end{pmatrix},\quad (ii)\ \begin{pmatrix}5 & 2 & 3 \\ -13 & -6 & -11 \\ 4 & 2 & 4 \end{pmatrix}, \quad (iii)\ \begin{pmatrix}3 & 1 & 1 \\ -15 & -5 & -5 \\ 6 & 2 & 2 \end{pmatrix}.\]

Which of them are diagonalisable?

Solution

(i): we have \(\operatorname{char}_A(x) = (x - 7)(x + 7) - 32 = x^2 - 81 = (x - 9)(x + 9),\) so the eigenvalues are \(\pm 9.\)

The \(\lambda = 9\) eigenspace is the kernel of \(9 \cdot 1_2 - A = \begin{pmatrix} 2 & 4 \\ 8 & 16\end{pmatrix},\) which is the 1-dimensional space spanned by \(\begin{pmatrix} 2 \\ -1\end{pmatrix}.\) Similarly, the \(\lambda = -9\) eigenspace is the kernel of \(-9 \cdot 1_2 - A = \begin{pmatrix}-16 & 4 \\ 8 & -2 \end{pmatrix},\) which is 1-dimensional spanned by \(\begin{pmatrix} 1 \\ 4\end{pmatrix}.\)

(ii) We compute that the characteristic polynomial is \(x (x - 1) (x - 2),\) so the eigenvalues are \(\{0, 1, 2\}.\) The eigenspaces are all 1-dimensional, and bases of them are given by the following eigenvectors: \[\lambda = 0 : \begin{pmatrix} 1 \\ -4 \\ 1\end{pmatrix}, \lambda=1 : \begin{pmatrix}1 \\ -5 \\ 2\end{pmatrix}, \lambda=2 : \begin{pmatrix} 1 \\ -3 \\ 1 \end{pmatrix}.\]

(iii) The characteristic polynomial is \(x^3,\) so the only eigenvalue is 0. Hence the eigenspace is just \(\ker A,\) which we compute by taking row echelon form of the transpose (augmented by the identity), as usual:

\[\left(\begin{array}{rrr|rrr} 3 & -15 & 6 & 1 & 0 & 0 \\ 1 & -5 & 2 & 0 & 1 & 0 \\ 1 & -5 & 2 & 0 & 0 & 1 \end{array}\right)\stackrel{RREF}{\leadsto} \left(\begin{array}{rrr|rrr} 1 & -5 & 2 & 0 & 0 & 1 \\ \hline 0 & 0 & 0 & 1 & 0 & -3 \\ 0 & 0 & 0 & 0 & 1 & -1 \end{array}\right)\] so a basis of the eigenspace is \(\left\{ \begin{pmatrix} 1 \\ 0 \\ -3\end{pmatrix}, \begin{pmatrix}0 \\ 1 \\ -1 \end{pmatrix}\right\}.\) (Of course, many other bases are possible.)

In (i) and (ii), the eigenvectors are linearly independent (this is easy to check by hand, and is also an instance of a general theorem from the next chapter) and there are enough of them to form a basis of \(\mathbb{R}^2\) resp. \(\mathbb{R}^3,\) so the matrices are diagonalisable. In (iii), the eigenvectors only span a 2-dimensional subspace of \(\mathbb{R}^3\) so the matrix is not diagonalisable.



MCQ 11.1

Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathcal S,\mathcal T\subset V.\) If \(\mathcal S\cup \mathcal T\) is a basis of \(V,\) then \(V=\operatorname{span}(\mathcal S)\oplus\operatorname{span}(\mathcal T).\)

  • True
  • False
MCQ 11.2

Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathcal S,\mathcal T\subset V.\) If \(\mathcal S\cup \mathcal T\) is a basis of \(V\) and \(\mathcal S\cap\mathcal T=\emptyset\) then \(V=\operatorname{span}(\mathcal S)\oplus\operatorname{span}(\mathcal T).\)

  • True
  • False
MCQ 11.3

If \(U_1,U_2,U_3\subset V\) are subspaces of a \(\mathbb{K}\)-vector space, then \(U_1+U_2=U_1+U_3\) implies \(U_2=U_3.\)

  • True
  • False
MCQ 11.4

If \(U_1,U_2,U_3\subset V\) are subspaces of a \(\mathbb{K}\)-vector space, then \(U_1+U_2=U_1+U_3\) implies \(\dim(U_2)=\dim(U_3).\)

  • True
  • False
MCQ 11.5

If \(U_1,U_2,U_3\subset V\) are subspaces of a finite-dimensional \(\mathbb{K}\)-vector space, then \(U_1\oplus U_2=U_1\oplus U_3\) implies \(\dim(U_2)=\dim(U_3).\)

  • True
  • False
MCQ 11.6

If \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A}^T\mathbf{A}) = \operatorname{Tr}(\mathbf{A}\mathbf{A}^T).\)

  • True
  • False
MCQ 11.7

If \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A}+\mathbf{B})=\operatorname{Tr}(\mathbf{A})+\operatorname{Tr}(\mathbf{B}).\)

  • True
  • False
MCQ 11.8

If \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\) is invertible, then \(\operatorname{Tr}(\mathbf{B}\mathbf{A}\mathbf{B}^{-1})=\operatorname{Tr}(\mathbf{A}).\)

  • True
  • False
MCQ 11.9

If \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A})\ne\operatorname{Tr}(\mathbf{A}^T).\)

  • True
  • False
MCQ 11.10

If \(1\) is the only eigenvalue of \(\mathbf{A},\) then \(\mathbf{A}=\mathbf{1}_{n}.\)

  • True
  • False
MCQ 11.11

The eigenvalues of an anti-symmetric matrix \(\mathbf{A}\in M_{2,2}(\mathbb{R})\) are pure imaginary.

  • True
  • False
MCQ 11.12

\(1\) is an eigenvalue of \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p\)

  • True
  • False
MCQ 11.13

The algebraic multiplicity of the eigenvalue \(1\) of \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p\) equals its geometric multiplicity.

  • True
  • False
MCQ 11.14

Let \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p.\) Then \(\dim(\operatorname{Eig}_f(1))=1.\)

  • True
  • False

Home

Chapters

Contents