10.6 The spectral theorem
We now come to one of the core results of the Linear Algebra II module:
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
For the proof of this statement we need two lemmas.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space of dimension \(n\geqslant 1\) and \(f : V \to V\) a self-adjoint endomorphism. Then \(f\) admits an eigenvalue \(\lambda \in \mathbb{R}.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space with induced norm \(\Vert \cdot \Vert : V \to \mathbb{R}.\) Then
for all \(v \in V\) we have \(\Vert v \Vert \geqslant 0\) and \(\Vert v \Vert=0\) if and only if \(v=0_V\);
for all \(s\in \mathbb{R}\) and all \(v \in V\) we have \(\Vert sv\Vert=|s|\Vert v\Vert\);
for all vectors \(v_1,v_2 \in V,\) we have the so-called triangle inequality \[\Vert v_1+v_2\Vert\leqslant \Vert v_1\Vert+\Vert v_2\Vert.\]
Recall that a subspace \(U\subset V\) is said to be stable under an endomorphism \(f : V \to V\) if \(f(u) \in U\) for all \(u \in U.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space, \(f : V \to V\) a self adjoint endomorphism and \(\lambda\) an eigenvalue of \(f.\) Then \((\operatorname{Eig}_f(\lambda))^{\perp}\) is stable under \(f.\)
Proof. Write \(U=\operatorname{Eig}_f(\lambda)\) and let \(w \in U^{\perp}.\) Then, for all \(u \in U\) we obtain \[\langle u,f(w)\rangle=\langle u,f^*(w)\rangle=\langle f(u),w\rangle=\lambda \langle u,w\rangle,\] where we use the self-adjointness of \(f\) and that \(u\) is an eigenvector of \(f.\) Since \(w \in U^{\perp},\) we have \(\langle u,w\rangle=0\) and hence \(\langle u,f(w)\rangle=0\) for all \(u \in U.\) This shows that \(f(w) \in U^{\perp},\) hence \(U^{\perp}\) is stable under \(f.\)
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
Conversely, assume that \(f\) is self-adjoint. We will use induction on the dimension \(n\) of \(V\) to show that \((V,\langle\cdot{,}\cdot\rangle)\) admits an orthonormal basis consisting of eigenvector of \(f.\) For \(n=1\) every endomorphism is diagonal, hence there is nothing to show and the statement is anchored.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space of dimension \(n\geqslant 1\) and \(f : V \to V\) a self-adjoint endomorphism. Then \(f\) admits an eigenvalue \(\lambda \in \mathbb{R}.\)
By Lemma 10.14 and Remark 10.3, the restriction of an inner product \(\langle\cdot{,}\cdot\rangle\) on a finite dimensional vector space \(V\) to a subspace \(U\subset V\) is always non-degenerate. Therefore, by Corollary 9.24, the orthogonal subspace \(U^{\perp}\) is always a complement to \(U,\) so that \(V=U\oplus U^{\perp}\) and \[\dim U^{\perp}=\dim V-\dim U\] by Remark 6.7 and Proposition 6.12.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space, \(f : V \to V\) a self adjoint endomorphism and \(\lambda\) an eigenvalue of \(f.\) Then \((\operatorname{Eig}_f(\lambda))^{\perp}\) is stable under \(f.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be an \(n\)-dimensional Euclidean space and \(\mathbf{b}=(v_1,\ldots,v_n)\) an ordered basis of \(V.\) For \(2\leqslant i\leqslant n\) we define recursively \[w_i=v_i-\Pi^{\perp}_{U_{i-1}}(v_i)\qquad \text{and}\qquad u_i=\frac{w_i}{\Vert w_i\Vert},\] where \(U_{i-1}=\operatorname{span}\{u_1,\ldots ,u_{i-1}\}\) and \(u_1=v_1/\Vert v_1\Vert.\) Then \(\mathbf{b}^{\prime}=(u_1,\ldots,u_n)\) is well defined and an orthonormal ordered basis of \(V.\) Moreover, \(\mathbf{b}^{\prime}\) is the unique orthonormal ordered basis of \(V\) so that the change of basis matrix \(\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\) is an upper triangular matrix with positive diagonal entries.
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{R})\) be a matrix. Then there exists an orthogonal matrix \(\mathbf{R}\in M_{n,n}(\mathbb{R})\) such that \(\mathbf{R}\mathbf{A}\mathbf{R}^T\) is a diagonal matrix if and only if \(\mathbf{A}\) is symmetric.
For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) we have by definition \((\mathbf{A}^T)^T=\mathbf{A}.\)
For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K}),\) we have \[(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.\] Indeed, by definition we have for all \(1\leqslant i\leqslant {\tilde{m}}\) and all \(1\leqslant j\leqslant m\) \[\left[(\mathbf{A}\mathbf{B})^T\right]_{ij}=[\mathbf{A}\mathbf{B}]_{ji}=\sum_{k=1}^n [\mathbf{A}]_{jk}[\mathbf{B}]_{ki}=\sum_{k=1}^n\left[\mathbf{B}^T\right]_{ik}\left[\mathbf{A}^T\right]_{kj}=\left[\mathbf{B}^T\mathbf{A}^T\right]_{ij}.\]
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
Let \(V,W\) be finite dimensional \(\mathbb{K}\)-vector spaces and \(\mathbf{b},\mathbf{b}^{\prime}\) ordered bases of \(V\) and \(\mathbf{c},\mathbf{c}^{\prime}\) ordered bases of \(W.\) Let \(g : V \to W\) be a linear map. Then we have \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{c}^{\prime})=\mathbf{C}(\mathbf{c},\mathbf{c}^{\prime})\mathbf{M}(g,\mathbf{b},\mathbf{c})\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\] In particular, for a linear map \(g : V \to V\) we have \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{b}^{\prime})=\mathbf{C}\,\mathbf{M}(g,\mathbf{b},\mathbf{b})\,\mathbf{C}^{-1},\] where we write \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime}).\)
Applying Proposition 6.27, we conclude that in the case of a finite dimensional \(\mathbb{K}\)-vector space \(V,\) an endomorphism \(g : V \to V\) is diagonalisable if and only if there exists an ordered basis \(\mathbf{b}\) of \(V\) such that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is a diagonal matrix. Moreover, \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is diagonalisable if and only if \(\mathbf{A}\) is similar over \(\mathbb{K}\) to a diagonal matrix.
Let \(\mathbf{e}=(\vec{e}_1,\ldots,\vec{e}_n)\) and \(\mathbf{d}=(\vec{d}_1,\ldots,\vec{d}_m)\) denote the ordered standard basis of \(\mathbb{K}^n\) and \(\mathbb{K}^m,\) respectively. Then for \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) we have \[\mathbf{A}=\mathbf{M}(f_\mathbf{A},\mathbf{e},\mathbf{d}),\] that is, the matrix representation of the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) with respect to the standard bases is simply the matrix \(\mathbf{A}.\) Indeed, we have \[f_\mathbf{A}(\vec{e}_j)=\mathbf{A}\vec{e}_j=\begin{pmatrix} A_{1j} \\ \vdots \\ A_{mj} \end{pmatrix}=\sum_{i=1}^m A_{ij}\vec{d}_i.\]
Let \(n \in \mathbb{N}\) and \((V,\langle\cdot{,}\cdot\rangle)\) be an \(n\)-dimensional Euclidean space equipped with an orthonormal ordered basis \(\mathbf{b}.\) Then an ordered basis \(\mathbf{b}^{\prime}\) of \(V\) is orthonormal with respect to \(\langle\cdot{,}\cdot\rangle\) if and only if the change of basis matrix \(\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\) is orthogonal.
10.6.1 Geometric description of self-adjoint endomorphisms
The spectral theorem tells us that self-adjoint endomorphisms can be diagonalised with an orthonormal basis. As a consequence one can give a precise geometric description of self-adjoint mappings. A first key observation towards this end is the following:
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Then the eigenspaces of \(f\) are orthogonal. That is, for eigenvalues \(\lambda\neq \mu\) of \(f\) we have \(\langle u,v\rangle=0\) for all \(u \in \operatorname{Eig}_f({\lambda})\) and for all \(v \in \operatorname{Eig}_f({\mu}).\)
Proof. Let \(u \in \operatorname{Eig}_f({\lambda})\) and \(v \in \operatorname{Eig}_f({\mu}).\) Then \[\lambda\langle u,v\rangle=\langle f(u),v\rangle=\langle u,f(v)\rangle=\mu \langle u,v\rangle\] and hence \(0=(\lambda-\mu)\langle u,v\rangle.\) It follows that \(\langle u,v\rangle=0\) since \(\lambda-\mu \neq 0.\)
Recall that a vector space \(V\) is the direct sum of vector subspaces \(U_1,\ldots ,U_k\) of \(V\) if every vector \(v \in V\) can be written uniquely as a sum \(v=u_1+u_2+\cdots+u_k\) with \(u_i \in U_i\) for \(1\leqslant i \leqslant k.\) In this case we write \(V=\bigoplus_{i=1}^k U_i.\) In the presence of an inner product on \(V,\) we may ask that the subspaces \(U_i\) are all orthogonal:
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(U_1,\ldots,U_k\) be subspaces of \(V\) such that \(V=\bigoplus_{i=1}^k U_i.\) We say \(V\) is the orthogonal direct sum of the subspaces \(U_1,\ldots ,U_k\) if for all \(i\neq j,\) we have \(\langle u_i,u_j\rangle=0\) for all \(u_i \in U_i\) and for all \(u_j \in U_j.\) In this case we write \[V=\bigoplus_{i=1}^k\!{}^{\perp}\, U_i.\]
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(U\subset V\) a subspace. Then \(V\) is the orthogonal direct sum of \(U\) and \(U^{\perp}.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(\{u_1,\ldots,u_n\}\) an orthogonal basis of \(V.\) Then \(V\) is the orthogonal direct sum of the subspaces \(U_i=\operatorname{span}\{u_i\}\) for \(1\leqslant i\leqslant n.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Let \(\{\lambda_1,\ldots,\lambda_k\}\) denote the eigenvalues of \(f.\) Then \[V=\bigoplus_{i=1}^k\!{}^{\perp}\, \mathrm{Eig}_f(\lambda_i).\]
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g :V \to V\) an endomorphism. Then the eigenspaces \(\operatorname{Eig}_{g}(\lambda)\) of \(g\) are in direct sum. In particular, if \(v_1,\ldots,v_m\) are eigenvectors corresponding to distinct eigenvalues of \(g,\) then \(\{v_1,\ldots,v_m\}\) are linearly independent.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Then the eigenspaces of \(f\) are orthogonal. That is, for eigenvalues \(\lambda\neq \mu\) of \(f\) we have \(\langle u,v\rangle=0\) for all \(u \in \operatorname{Eig}_f({\lambda})\) and for all \(v \in \operatorname{Eig}_f({\mu}).\)
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
We now obtain the aforementioned geometric description: A self adjoint endomorphism of a finite dimensional vector space is a linear combination of orthogonal projections.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space and \(f : V \to V\) a self-adjoint endomorphism with eigenvalues \(\{\lambda_1,\ldots,\lambda_{k}\}.\) Then we have for all \(v \in V\) \[f(v)=\sum_{i=1}^k\lambda_i\Pi^{\perp}_{U_i}(v),\] where we write \(U_i=\operatorname{Eig}_f(\lambda_i).\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Then the eigenspaces of \(f\) are orthogonal. That is, for eigenvalues \(\lambda\neq \mu\) of \(f\) we have \(\langle u,v\rangle=0\) for all \(u \in \operatorname{Eig}_f({\lambda})\) and for all \(v \in \operatorname{Eig}_f({\mu}).\)
10.7 Quadratic forms
Closely related to the notion of a symmetric bilinear form is that of a quadratic form.
A function \(q : V \to \mathbb{R}\) is called a quadratic form on \(V\) if there exists a symmetric bilinear form \(\langle\cdot{,}\cdot\rangle\) on \(V\) such that \[q(v)=\langle v,v\rangle\] for all \(v \in V.\)
The adjective quadratic is used since a quadratic form \(q : V\to \mathbb{R}\) is so-called \(2\)-homogeneous, that is, it satisfies \[q(sv)=s^2 q(v)\] for all \(s\in \mathbb{R}\) and \(v \in V.\)
By definition, every symmetric bilinear form \(\langle\cdot{,}\cdot\rangle\) on \(V\) gives rise to a quadratic form \(q.\) The mapping \(\langle\cdot{,}\cdot\rangle\mapsto q\) from the set of symmetric bilinear forms into the set of quadratic forms is thus surjective. That this mapping is also injective is a consequence of the so-called polarisation identity \[4\langle v_1,v_2\rangle=\langle v_1+v_2,v_1+v_2\rangle-\langle v_1-v_2,v_1-v_2\rangle\] which holds for all \(v_1,v_2 \in V.\) Written in terms of the quadratic form associated to \(\langle\cdot{,}\cdot\rangle,\) it becomes \[4\langle v_1,v_2\rangle=q(v_1+v_2)-q(v_1-v_2).\] Therefore, if two symmetric bilinear forms define the same quadratic form, then they must agree.
Consider \(V=\mathbb{R}^2.\) The function \[q :\mathbb{R}^2 \to \mathbb{R},\qquad \vec{v}=\begin{pmatrix} x \\ y \end{pmatrix} \mapsto q(\vec{v})=2x^2-4xy+5y^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 2 & -2 \\ -2 & 5 \end{pmatrix}.\]
Likewise, the function \[q : \mathbb{R}^3 \to \mathbb{R}, \qquad \vec{v}=\begin{pmatrix} x \\ y \\ z \end{pmatrix} \mapsto q(\vec{v})=4xy-6yz+z^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 0 & 2 & 0 \\ 2 & 0 & -3 \\ 0 & -3 & 1 \end{pmatrix}.\]
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space of dimension \(n\in \mathbb{N}\) and \(q : V \to \mathbb{R}\) a quadratic form. Then there exists an orthonormal ordered basis \(\mathbf{b}=(v_1,\ldots,v_n)\) of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{R}^n\) and a diagonal matrix \(\mathbf{D}\in M_{n,n}(\mathbb{R})\) such that for all \(v \in V\) \[q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v).\]
The lines spanned by the vectors \(v_i\) for \(1\leqslant i\leqslant n\) of the orthonormal basis are known as the principal axes of the quadratic form \(q\). We will explain this terminology below.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space of dimension \(n\in \mathbb{N}\) and \(q : V \to \mathbb{R}\) a quadratic form. Then there exists an orthonormal ordered basis \(\mathbf{b}=(v_1,\ldots,v_n)\) of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{R}^n\) and a diagonal matrix \(\mathbf{D}\in M_{n,n}(\mathbb{R})\) such that for all \(v \in V\) \[q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v).\]
The proof of Proposition 10.44 implies that an endomorphism \(f : V \to V\) of a finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle)\) is self-adjoint if and only if its matrix representation with respect to an orthonormal basis \(\mathbf{b}\) of \(V\) is symmetric. In particular, in \(\mathbb{R}^n\) equipped with the standard scalar product, a mapping \(f_\mathbf{A}: \mathbb{R}^n \to \mathbb{R}^n\) for \(\mathbf{A}\in M_{n,n}(\mathbb{R})\) is self-adjoint if and only if \(\mathbf{A}\) is symmetric.
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space and \(f : V \to V\) an orthogonal transformation. Then \(f\) is normal. Indeed, using that \(f\) is orthogonal, we obtain for all \(u,v \in V\) \[\langle f^{-1}(u),v\rangle=\langle f(f^{-1}(u)),f(v)\rangle=\langle u,f(v)\rangle\] so that the adjoint mapping of an orthogonal transformation is its inverse mapping, \(f^*=f^{-1}.\) It follows that \(f\circ f^{*}=f\circ f^{-1}=\mathrm{Id}_V=f^{-1}\circ f=f^*\circ f\) so that \(f\) is normal.
Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space, \(\mathbf{b}=(v_1,\ldots,v_n)\) an ordered basis of \(V\) with associated linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{K}^n\) and \(\langle\cdot{,}\cdot\rangle\) a bilinear form on \(V.\) Then
for all \(w_1,w_2 \in V\) we have \[\langle w_1,w_2\rangle=(\boldsymbol{\beta}(w_1))^T\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\boldsymbol{\beta}(w_2).\]
\(\langle\cdot{,}\cdot\rangle\) is symmetric if and only if \(\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\) is symmetric;
if \(\mathbf{b}^{\prime}\) is another ordered basis of \(V,\) then \[\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b}^{\prime})=\mathbf{C}^T\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\mathbf{C},\] where \(\mathbf{C}=\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\) denotes the change of basis matrix, see Definition 3.104.
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space, \(\mathbf{b}=(v_1,\ldots,v_n)\) an ordered basis of \(V\) with associated linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{K}^n\) and \(\langle\cdot{,}\cdot\rangle\) a bilinear form on \(V.\) Then
for all \(w_1,w_2 \in V\) we have \[\langle w_1,w_2\rangle=(\boldsymbol{\beta}(w_1))^T\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\boldsymbol{\beta}(w_2).\]
\(\langle\cdot{,}\cdot\rangle\) is symmetric if and only if \(\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\) is symmetric;
if \(\mathbf{b}^{\prime}\) is another ordered basis of \(V,\) then \[\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b}^{\prime})=\mathbf{C}^T\mathbf{M}(\langle\cdot{,}\cdot\rangle,\mathbf{b})\mathbf{C},\] where \(\mathbf{C}=\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\) denotes the change of basis matrix, see Definition 3.104.
Let \(n \in \mathbb{N}\) and \((V,\langle\cdot{,}\cdot\rangle)\) be an \(n\)-dimensional Euclidean space equipped with an orthonormal ordered basis \(\mathbf{b}.\) Then an endomorphism \(f : V \to V\) is an orthogonal transformation if and only if its matrix representation \(\mathbf{R}=\mathbf{M}(f,\mathbf{b},\mathbf{b})\) with respect to \(\mathbf{b}\) is an orthogonal matrix.
Let \(V,W\) be finite dimensional \(\mathbb{K}\)-vector spaces and \(\mathbf{b},\mathbf{b}^{\prime}\) ordered bases of \(V\) and \(\mathbf{c},\mathbf{c}^{\prime}\) ordered bases of \(W.\) Let \(g : V \to W\) be a linear map. Then we have \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{c}^{\prime})=\mathbf{C}(\mathbf{c},\mathbf{c}^{\prime})\mathbf{M}(g,\mathbf{b},\mathbf{c})\mathbf{C}(\mathbf{b}^{\prime},\mathbf{b})\] In particular, for a linear map \(g : V \to V\) we have \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{b}^{\prime})=\mathbf{C}\,\mathbf{M}(g,\mathbf{b},\mathbf{b})\,\mathbf{C}^{-1},\] where we write \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime}).\)
Consider \(V=\mathbb{R}^2.\) The function \[q :\mathbb{R}^2 \to \mathbb{R},\qquad \vec{v}=\begin{pmatrix} x \\ y \end{pmatrix} \mapsto q(\vec{v})=2x^2-4xy+5y^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 2 & -2 \\ -2 & 5 \end{pmatrix}.\]
Likewise, the function \[q : \mathbb{R}^3 \to \mathbb{R}, \qquad \vec{v}=\begin{pmatrix} x \\ y \\ z \end{pmatrix} \mapsto q(\vec{v})=4xy-6yz+z^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 0 & 2 & 0 \\ 2 & 0 & -3 \\ 0 & -3 & 1 \end{pmatrix}.\]
Especially in the physics literature it is customary to also use the letters \(x,y\) to denote functions from \(\mathbb{R}^2 \to \mathbb{R}\) (and likewise for higher dimensions). The function \(x\) returns the first component of a vector \(\vec{v} \in \mathbb{R}^2\) and \(y\) returns the second component, so that for instance \[x\left(\begin{pmatrix} 2 \\ -4\end{pmatrix}\right)=2\qquad \text{and}\qquad y\left(\begin{pmatrix} 3 \\ 5\end{pmatrix}\right)=5.\] Thinking of \(x,y\) as functions – and doing the same for \(X ,Y ,\) the quadratic form from the previous example can then be written as (notice that we write \(q\) and not \(q(\vec{v})\)) \[q=2x^2-4xy+5y^2=6X ^2+Y ^2.\]
Let \(q :V \to \mathbb{R}\) be a quadratic form and \(c \in \mathbb{R}.\) A quadric \(Q\) in \(V\) is the set of solutions \(v\in V\) to an equation of the form \(q(v)=c.\)
The set \[Q=\left\{(x,y) \in \mathbb{R}^2 | 2x^2-4xy+5y^2=1\right\}\] is a quadric in \(\mathbb{R}^2.\) Written this way it is not immediately clear how the set of solution looks like. With respect to our new orthonormal orthonormal basis \(\mathbf{b}=(v_1,v_2)\) provided by the example above, we can however write \(Q\) as \[Q=\left\{\vec{v} \in \mathbb{R}^2| 6X (\vec{v})^2+Y (\vec{v})^2=1\right\}\] and we recognise \(Q\) as an ellipse. The \(X\)-axis spanned by \(v_1\) and the \(Y\)-axis spanned by \(v_2\) are symmetry axes for the ellipse and are known as its principal axes, see Figure 10.6.
(\(\heartsuit\) - not examinable). Quadratic forms also play an important role in calculus. Let \(f : \mathbb{R}^n \to \mathbb{R}\) be a twice continuously differentiable function. The Hessian matrix of \(f\) at \(\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{R}^n\) is given by \[[\mathbf{H}_f(\vec{x})]_{ij}=\frac{\partial^2 f}{\partial x_i \partial x_j}\] where \(1\leqslant i,j\leqslant n.\) By the Schwartz theorem, this matrix is symmetric and hence for each \(\vec{x} \in \mathbb{R}^n\) we obtain a quadratic form on \(\mathbb{R}^n\) defined by the rule \[q(\vec{h})=\frac{1}{2}\vec{h}^T\mathbf{H}_f(\vec{x})\vec{h}=\frac{1}{2}\langle \vec{h},\vec{h}\rangle_{\mathbf{H}_f(\vec{x})}.\] for all \(\vec{h} \in \mathbb{R}^n\) and where \(\langle\cdot{,}\cdot\rangle\) denotes the standard scalar product of \(\mathbb{R}^n.\) The significance of this quadratic form arises from the Taylor approximation of \(f.\) For vectors \(\vec{h} \in \mathbb{R}^n\) of small length we have the approximation \[f(\vec{x}+\vec{h})\approx f(\vec{x})+\langle \nabla f(\vec{x}),\vec{h}\rangle+\frac{1}{2}\langle \vec{h},\vec{h}\rangle_{\mathbf{H}_f(\vec{x})},\] where \(\nabla f(\vec{x})\) denotes the gradient of \(f\) at \(\vec{x}.\) Recall that at a critical point \(\vec{x}\) of \(f\) we have \(\nabla f(\vec{x})=0_{\mathbb{R}^n}\) and hence \[f(\vec{x}+\vec{h})\approx f(\vec{x})+q(\vec{h}).\] In order to decide whether \(f\) admits a local maximum / a local minimum at a critical point, one thus needs to investigate the sign of \(q(\vec{h})\) for all \(\vec{h}.\)
The previous remark is one motivation for the following definition:
Let \(q : V \to \mathbb{R}\) be a quadratic form on the \(\mathbb{R}\)-vector space \(V.\) Then \(q\) is called
positive or positive semi-definite if \(q(v)\geqslant 0\) for all \(v \in V\);
positive definite if \(q(v)\geqslant 0\) and \(q(v)=0\) if and only if \(v=0_V\);
negative or negative semi-definite if \(q(v)\leqslant 0\) for all \(v \in V\);
negative definite if \(q(v)\leqslant 0\) and \(q(v)=0\) if and only if \(v=0_V\);
indefinite if there exists \(v \in V\) and \(w \in V\) such that \(q(v)<0\) and \(q(w)>0.\)
Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space of dimension \(n\in \mathbb{N}\) and \(q : V \to \mathbb{R}\) a quadratic form. Then there exists an orthonormal ordered basis \(\mathbf{b}=(v_1,\ldots,v_n)\) of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{R}^n\) and a diagonal matrix \(\mathbf{D}\in M_{n,n}(\mathbb{R})\) such that for all \(v \in V\) \[q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v).\]
Exercises
Show the following characterisations:
\(q\) is positive if and only if all diagonal entries of \(\mathbf{D}\) are greater than or equal to zero;
\(q\) is positive definite if and only if all diagonal entries of \(\mathbf{D}\) are positive;
\(q\) is negative if and only if all diagonal entries of \(\mathbf{D}\) are less than or equal to zero;
\(q\) is negative definite if and only if all diagonal entries of \(\mathbf{D}\) are negative;
\(q\) is indefinite if and only if \(\mathbf{D}\) has positive and negative diagonal entries.
Solution
Let \(\mathbf{b}=(v_1,\ldots, v_n)\) be an ordered orthonormal basis of \(V\) such that \(q(v) = \boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v)\) for all \(v\in V,\) where \(\mathbf{D}= \operatorname{diag}(d_1,\ldots,d_n).\) Observe that \[q(v_i) = \boldsymbol{\beta}(v_i)^T\mathbf{D}\boldsymbol{\beta}(v_i) = \vec e_i ^T\mathbf{D}\vec e_i=d_i.\]
Assume that \(q\) is positive, i.e. \(q(v)\geqslant 0\) for all \(v\in V.\) By choosing \(v=v_i,\) we obtain \(0\leqslant q(v_i) =d_i\) and hence \(d_i\geqslant 0\) for all \(i=1,\ldots,n.\) Conversely, if \(d_i\geqslant 0\) for all \(i=1,\ldots,n\) and \(v = s_1v_1+\ldots+s_nv_n,\) we compute \[\begin{aligned} q(v) & = \left(\sum_{i=1}^ns_i \vec e_i^T\right)\mathbf{D}\left(\sum_{j=1}^ns_j \vec e_j\right) = \sum_{i,j=1}^ns_is_j[\mathbf{D}]_{ij}=\sum_{i=1}^ns_i^2d_i \geqslant 0. \end{aligned}\]
Assume that \(q\) is positive definite, then \(q(v_i)=d_i>0,\) since \(v_i\ne 0_V\) for all \(i=1,\ldots,n.\) Conversely, the last computation of the item above shows that \[q(v)=\sum_{i=1}^ns_i^2d_i\geqslant 0.\] Since \(d_i>0\) for all \(i\in\{1,\ldots,n\},\) we find \(q(v)=0\) if and only if \(s_i=0\) for all \(i\in\{1,\ldots,n\},\) but then \(v=0_V.\)
Analogous to (a)
Analogous to (b)
Assume that \(q\) is indefinite i.e. there exist vectors \[v=\sum_{i=1}^n s_i v_i, w=\sum_{j=1}^nt_j v_j\] such that \(q(v)<0\) and \(q(w)>0.\) Since \[q(v) = \sum_{i=1}^ns_i^2d_i<0,\] there must be at least one index \(k\) such that \(d_k<0.\) On the other hand, since \[q(w) = \sum_{j=1}^nt_j^2d_j>0,\] there must be at least one index \(\ell\) such that \(d_{\ell}>0.\) Conversely, if \(\mathbf{D}\) has positive and negative entries, let \(d_k<0\) and \(d_{\ell}>0\) for some indices \(1\leqslant k\ne \ell\leqslant n,\) then \(q(v_k)=d_k<0\) and \(q(v_{\ell})=d_{\ell} >0\) and hence \(q\) is indefinite.