10 The determinant, II
10.1 Properties of the determinant
For matrices \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K})\) we have \[\det(\mathbf{A}\mathbf{B})=\det(\mathbf{A})\det(\mathbf{B}).\]
Proof. We first consider the case where \(\mathbf{A}\) is not invertible, then \(\det(\mathbf{A})=0\) (see the proof of Proposition 9.15). If \(\mathbf{A}\) is not invertible, then neither is \(\mathbf{A}\mathbf{B}.\) Indeed, if \(\mathbf{A}\mathbf{B}\) were invertible, then there exists a matrix \(\mathbf{C}\) such that \((\mathbf{A}\mathbf{B})\mathbf{C}=\mathbf{1}_{n}.\) But since the matrix product is associative, this also gives \(\mathbf{A}(\mathbf{B}\mathbf{C})=\mathbf{1}_{n},\) so that \(\mathbf{B}\mathbf{C}\) is a right inverse of \(\mathbf{A}.\) By Proposition 7.7, \(\mathbf{A}\) is invertible, a contradiction. Hence if \(\mathbf{A}\) is not invertible, we must also have \(\det(\mathbf{A}\mathbf{B})=0,\) which verifies that \(\det(\mathbf{A}\mathbf{B})=0=\det(\mathbf{A})\det(\mathbf{B})\) for \(\mathbf{A}\) not invertible.
If \(\mathbf{A}\) is invertible, we can write it as a product of elementary matrices and applying the second part of Lemma 9.14, we conclude that \(\det(\mathbf{A}\mathbf{B})=\det(\mathbf{A})\det(\mathbf{B}).\)
A matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is invertible if and only if \(\det(\mathbf{A})\neq 0.\) Moreover, in the case where \(\mathbf{A}\) is invertible, we have \[\det\left(\mathbf{A}^{-1}\right)=\frac{1}{\det \mathbf{A}}.\]
Proof. We have already seen that if \(\mathbf{A}\) is not invertible, then \(\det(\mathbf{A})=0.\) This is equivalent to saying that if \(\det(\mathbf{A})\neq 0,\) then \(\mathbf{A}\) is invertible. It thus remains to show that if \(\mathbf{A}\) is invertible, then \(\det(\mathbf{A})\neq 0.\) Suppose \(\mathbf{A}\) is invertible, then applying Proposition 10.1 gives \[\det(\mathbf{1}_{n})=\det\left(\mathbf{A}\mathbf{A}^{-1}\right)=\det(\mathbf{A})\det\left(\mathbf{A}^{-1}\right)=1\] so that \(\det(\mathbf{A})\neq 0\) and \(\det\left(\mathbf{A}^{-1}\right)=1/\det(\mathbf{A}).\)
Recall that for scalars \(x_1,\ldots,x_n \in \mathbb{K},\) we write \[\prod_{i=1}^n x_i=x_1x_2\cdots x_n.\]
Let \(n \in \mathbb{N}\) and \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n} \in M_{n,n}(\mathbb{K})\) be an upper triangular matrix so that \(A_{ij}=0\) for \(i>j.\) Then \[\tag{10.1} \det(\mathbf{A})=\prod_{i=1}^n A_{ii}=A_{11}A_{22}\cdots A_{nn}.\]
Proof. We use induction. For \(n=1\) the condition \(A_{ij}=0\) for \(i>j\) is vacuous and (10.1) is trivially satisfied, thus the statement is anchored.
Inductive step: Assume \(n \in \mathbb{N}\) and \(n\geqslant 2.\) We want to show that if (10.1) holds for upper triangular \((n-1)\times(n-1)\)-matrices, then also for upper triangular \(n\times n\)-matrices. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n} \in M_{n,n}(\mathbb{K})\) be an upper triangular matrix. Choosing \(l=1\) in the formula for \(\det(\mathbf{A}),\) we obtain \[\begin{aligned} \det(\mathbf{A})&=\sum_{k=1}^n(-1)^{k+1}A_{k1}\det\left(\mathbf{A}^{(k,1)}\right)=A_{11}\det\left(\mathbf{A}^{(1,1)}\right)+\sum_{k=2}^nA_{k1}\det\left(\mathbf{A}^{(k,1)}\right)\\ &=A_{11}\det\left(\mathbf{A}^{(1,1)}\right), \end{aligned}\] where the last equality uses that \(A_{k1}=0\) for \(k>1.\) We have \(\mathbf{A}^{(1,1)}=(A_{ij})_{2\leqslant i,j\leqslant n}\) and since \(\mathbf{A}\) is an upper triangular matrix, it follows that \(\mathbf{A}^{(1,1)}\) is an \((n-1)\times(n-1)\) upper triangular matrix as well. Hence by the induction hypothesis, we obtain \[\det(\mathbf{A}^{(1,1)})=\prod_{i=2}^nA_{ii}.\] We conclude that \(\det(\mathbf{A})=\prod_{i=1}^n A_{ii},\) as claimed.
10.2 Permutations
A rearrangement of the natural numbers from \(1\) up to \(n\) is called a permutation:
Let \(n \in \mathbb{N}\) and \(\mathcal{X}_n=\{1,2,3,\ldots,n\}.\) A permutation is a bijective mapping \(\sigma : \mathcal{X}_n \to \mathcal{X}_n.\) The set of all permutations of \(\mathcal{X}_n\) is denoted by \(S_n.\)
If \(\tau,\sigma : \mathcal{X}_n \to \mathcal{X}_n\) are permutations, it is customary to write \(\tau\sigma\) or \(\tau\cdot \sigma\) instead of \(\tau \circ \sigma.\) Furthermore, the identity mapping \(\mathrm{Id}_{\mathcal{X}_n}\) is often simply denoted by \(1.\) A convenient way to describe a permutation \(\sigma \in S_n\) is in terms of a \(2\times n\) matrix \[\begin{pmatrix} i \\ \sigma(i) \end{pmatrix}_{1\leqslant i \leqslant n}.\] which we denote by \(\boldsymbol{\sigma}.\) For instance, for \(n=4,\) the matrix \[\boldsymbol{\sigma}=\begin{pmatrix} 1 & 2 & 3 & 4 \\ 2 & 3 & 1 & 4 \end{pmatrix}\] corresponds to the permutation \(\sigma\) satisfying \(\sigma(1)=2,\sigma(2)=3,\sigma(3)=1,\sigma(4)=4.\)
Permutations which only swap two natural numbers and leave all remaining numbers unchanged are known as transpositions:
Let \(n \in \mathbb{N}\) and \(1\leqslant k,l\leqslant n\) with \(k\neq l.\) The transposition \(\tau_{k,l} \in S_n\) is the permutation satisfying \[\tau_{k,l}(k)=l, \qquad \tau_{k,l}(l)=k, \qquad \tau_{k,l}(i)=i \text{ if } i \notin \{k,l\}.\]
Every permutation \(\sigma \in S_n\) defines a linear map \(g : \mathbb{K}^n \to \mathbb{K}^n\) satisfying \(g(\vec{e}_i)=\vec{e}_{\sigma(i)},\) where \(\{\vec{e}_1,\ldots,\vec{e}_n\}\) denotes the standard basis of \(\mathbb{K}^n.\) Since \(g\) is linear, there exists a unique matrix \(\mathbf{P}_{\sigma} \in M_{n,n}(\mathbb{K})\) so that \(g=f_{\mathbf{P}_{\sigma}}.\) Observe that the column vectors of the matrix \(\mathbf{P}_{\sigma}\) are given by \(\vec{e}_{\sigma(1)},\vec{e}_{\sigma(2)},\ldots,\vec{e}_{\sigma(n)}.\)
We call \(\mathbf{P}_{\sigma} \in M_{n,n}(\mathbb{K})\) the permutation matrix associated to \(\sigma \in S_n.\)
Let \(n=4.\) For instance, we have \[\boldsymbol{\sigma}=\begin{pmatrix} 1 & 2 & 3 & 4 \\ 2 & 3 & 1 & 4 \end{pmatrix} \qquad \mathbf{P}_{\sigma}=\begin{pmatrix} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\] and \[\boldsymbol{\tau}_{2,4}=\begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 4 & 3 & 2 \end{pmatrix}\qquad \mathbf{P}_{\tau_{2,4}}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \end{pmatrix}.\]
Notice that \(\mathbf{P}_{\tau_{k,l}}=\mathbf{P}_{k,l},\) where \(\mathbf{P}_{k,l}\) is one of the elementary matrices of size \(n\) (see M01 Algorithmics).
Assigning to a permutation its permutation matrix turns composition of permutations into matrix multiplication:
Let \(n \in \mathbb{N}.\) Then \(\mathbf{P}_1=\mathbf{1}_{n}\) and for all \(\sigma,\pi \in S_n\) we have \[\mathbf{P}_{\pi \cdot \sigma}=\mathbf{P}_\pi \mathbf{P}_\sigma.\] In particular, for all \(\sigma \in S_n,\) the permutation matrix \(\mathbf{P}_\sigma\) is invertible with \((\mathbf{P}_{\sigma})^{-1}=\mathbf{P}_{\sigma^{-1}}.\)
Considering \(n=3.\) For \[\boldsymbol{\sigma}=\begin{pmatrix} 1 & 2 & 3 \\ 3 & 1 & 2 \end{pmatrix} \quad \text{and}\qquad \boldsymbol{\pi}=\begin{pmatrix} 1 & 2 & 3 \\ 1 & 3 & 2 \end{pmatrix}\qquad \text{we have} \quad \boldsymbol{\pi}\cdot\boldsymbol{\sigma}=\begin{pmatrix} 1 & 2 & 3 \\ 2 & 1 & 3\end{pmatrix},\] as well as \[\mathbf{P}_\sigma=\begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}, \qquad \mathbf{P}_\pi=\begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} \quad \text{and} \quad \mathbf{P}_{\pi\cdot \sigma}=\begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}.\] Thus we obtain \[\mathbf{P}_{\pi\cdot\sigma}=\begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}\begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}=\mathbf{P}_\pi \mathbf{P}_\sigma,\] as claimed by Proposition 10.11.
Proof of Proposition 10.11. The matrix \(\mathbf{P}_1\) has column vectors given by \(\vec{e}_{1},\ldots,\vec{e}_n,\) hence \(\mathbf{P}_1=\mathbf{1}_{n}.\)
Using Proposition 7.2 and Theorem 7.6 it is sufficient to show that for all \(\pi,\sigma \in S_n\) we have \(f_{\mathbf{P}_{\pi\cdot \sigma}}=f_{\mathbf{P}_\pi}\circ f_{\mathbf{P}_\sigma}.\) For all \(1\leqslant i\leqslant n,\) we obtain \[f_{\mathbf{P}_\pi}\left(f_{\mathbf{P}_\sigma}(\vec{e}_i)\right)=f_{\mathbf{P}_\pi}\left(\vec{e}_{\sigma(i)}\right)=\vec{e}_{\pi(\sigma(i))}=\vec{e}_{(\pi\cdot\sigma)(i)}=f_{\mathbf{P}_{\pi\cdot \sigma}}(\vec{e}_i).\] The two maps thus agree on the ordered basis \(\mathbf{e}=(\vec{e}_1,\ldots,\vec{e}_n)\) of \(\mathbb{K}^n,\) so that the second claim follows by applying Lemma 8.6.
We have \[\mathbf{P}_{\sigma \cdot \sigma^{-1}}=\mathbf{P}_{1}=\mathbf{1}_{n}=\mathbf{P}_{\sigma}\mathbf{P}_{\sigma^{-1}}\] showing that \(\mathbf{P}_{\sigma}\) is invertible with inverse \((\mathbf{P}_{\sigma})^{-1}=\mathbf{P}_{\sigma^{-1}}.\)
For \(\sigma \in S_n\) we call \(\operatorname{sgn}(\sigma)=\det(\mathbf{P}_{\sigma})\) its signature.
Combining Proposition 10.1 and Proposition 10.11, we conclude that \[\operatorname{sgn}(\pi\cdot \sigma)=\operatorname{sgn}(\pi)\operatorname{sgn}(\sigma)\] for all \(\pi,\sigma \in S_n.\)
Since \(\mathbf{P}_{\tau_{k,l}}=\mathbf{P}_{k,l}\) and \(\det \mathbf{P}_{k,l}=-1\) by Lemma 9.14, we conclude that \[\operatorname{sgn}(\tau_{k,l})=-1\] for all transpositions \(\tau_{k,l} \in S_n.\)
Similarly to elementary matrices being the building blocks of invertible matrices, transpositions are the building blocks of permutations:
Let \(n\in \mathbb{N}\) and \(\sigma \in S_n.\) Then there exists \(m \geqslant 0\) and \(m\) transpositions \(\tau_{k_1,l_1},\ldots, \tau_{k_m,l_m} \in S_n\) such that \(\sigma=\tau_{k_m,l_m}\cdots \tau_{k_1,l_1},\) where we use the convention that \(0\) transpositions corresponds to the identity permutation.
Let \(n=6\) and \(\sigma\) the permutation defined by the matrix \[\boldsymbol{\sigma}=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 3 & 5 & 2 & 4 & 6 & 1\end{pmatrix}.\] To express it as a product of transposition, we write \[\begin{array}{cccccc|c} 3 & 5 & 2 & 4 & 6 & 1 & \\ 3 & 2 & 5 & 4 & 6 & 1 & \tau_{2,3} \\ 1 & 2 & 5 & 4 & 6 & 3 & \tau_{1,6} \\ 1 & 2 & 5 & 4 & 3 & 6 & \tau_{5,6} \\ 1 & 2 & 3 & 4 & 5 & 6 & \tau_{3,5}\end{array}\] so that \(\sigma=\tau_{3,5}\tau_{5,6}\tau_{1,6}\tau_{2,3}.\)
Proof of Proposition 10.15. We use induction. For \(n=1\) we have \(\mathcal{X}_n=\{1\}\) and the only permutation is the identity permutation \(1,\) so the statement is trivially true and hence anchored.
Inductive step: Assume \(n \in \mathbb{N}\) and \(n\geqslant 2.\) We want to show that if the claim holds for \(S_{n-1},\) then also for \(S_n.\) Let \(\sigma \in S_n\) and define \(k=\sigma(n).\) Then the permutation \(\sigma_1=\tau_{n,k}\sigma\) satisfies \(\sigma_1(n)=\tau_{n,k}\sigma(n)=\tau_{n,k}(k)=n\) and hence does not permute \(n.\) Restricting \(\sigma_1\) to the first \(n-1\) elements, we obtain a permutation of \(\{1,\ldots,n-1\}.\) By the induction hypothesis, we thus have \({\tilde{m}} \in \mathbb{N}\) and \(\tau_{k_1,l_1},\ldots \tau_{k_{{\tilde{m}}}},\tau_{l_{{\tilde{m}}}} \in S_n\) such that \[\sigma_1=\tau_{k_{{\tilde{m}}},l_{{\tilde{m}}}}\cdots \tau_{k_1,l_1}=\tau_{n,k}\sigma.\] Since \(\tau_{n,k}^2=1,\) multiplying from the left with \(\tau_{n,k}\) gives \(\sigma=\tau_{n,k}\tau_{k_{{\tilde{m}}},l_{{\tilde{m}}}}\cdots \tau_{k_1,l_1},\) the claim follows with \(m={\tilde{m}}+1.\)
Combining Definition 10.13, Remark 10.14 and Proposition 10.15, we conclude:
Let \(n \in \mathbb{N}\) and \(\sigma \in S_n.\) Then \(\operatorname{sgn}(\sigma)=\pm 1.\) If \(\sigma\) is a product of \(m\) transpositions, then \(\operatorname{sgn}(\sigma)=(-1)^m.\)
Permutations with \(\operatorname{sgn}(\sigma)=1\) are called even and permutations with \(\operatorname{sgn}(\sigma)=-1\) are called odd, since they arise from the composition of an even or odd number of transpositions, respectively.
10.3 The Leibniz formula
Besides the Laplace expansion, there is also a formula for the determinant which relies on permutations. As a warm-up, we first consider the case \(n=2.\) Using the linearity of the determinant in the first row, we obtain \[\det\begin{pmatrix} a & b \\ c & d \end{pmatrix}=\det\begin{pmatrix} a & 0 \\ c & d \end{pmatrix} +\det \begin{pmatrix} 0 & b \\ c & d \end{pmatrix},\] where \(a,b,c,d \in \mathbb{K}.\) Using the linearity of the determinant in the second row, we can further decompose the two above summands \[\det\begin{pmatrix} a & b \\ c & d \end{pmatrix}=\underbrace{\det \begin{pmatrix} a & 0 \\ c & 0 \end{pmatrix}+\det\begin{pmatrix} a & 0 \\ 0 & d \end{pmatrix}}_{=\det \begin{pmatrix} a & 0 \\ c & d\end{pmatrix}}+\underbrace{\det \begin{pmatrix} 0 & b \\ c & 0 \end{pmatrix}+\det\begin{pmatrix} 0 & b \\ 0 & d \end{pmatrix}}_{=\det \begin{pmatrix} 0 & b \\ c & d\end{pmatrix}}\] The first and fourth summand are always zero due to the occurrence of a zero column. The second and third summand are possibly nonzero (it might still happen that they are zero in the case where some of \(a,b,c,d\) are zero). In any case, we get \[\det\begin{pmatrix} a & b \\ c & d \end{pmatrix}=\det \begin{pmatrix} a & 0 \\ 0 & d \end{pmatrix}+\det \begin{pmatrix} 0 & b \\ c & 0 \end{pmatrix}.\] We can proceed analogously in general. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n} \in M_{n,n}(\mathbb{K}).\) We denote the rows of \(\mathbf{A}\) by \(\{\vec{\alpha}_1,\ldots,\vec{\alpha}_n\}.\) Using the linearity of \(\det\) in the first row, we can write \[\begin{gathered} \det \mathbf{A}=\det\begin{pmatrix} A_{11} & 0 & 0 & \cdots & 0 \\ && \vec{\alpha}_2 \\ && \vdots \\ && \vec{\alpha}_n&&\end{pmatrix}+\det\begin{pmatrix} 0 & A_{12} & 0 & \cdots & 0 \\ && \vec{\alpha}_2 \\ && \vdots \\ && \vec{\alpha}_n&&\end{pmatrix}+\cdots\\ \cdots +\det\begin{pmatrix} 0 & 0 & 0 & \cdots & A_{1n} \\ && \vec{\alpha}_2 \\ && \vdots \\ && \vec{\alpha}_n&&\end{pmatrix}. \end{gathered}\] We can now use the linearity in the second row and proceed in the same fashion with each of the above summands. We continue this procedure until the \(n\)-th row. As a result, we can write \[\tag{10.2} \det \mathbf{A}= \sum_{k=1}^{n^n} \det \mathbf{M}_k\] where each of the matrices \(\mathbf{M}_k\) has exactly one possibly nonzero entry in each row. As above, some of the matrices \(\mathbf{M}_k\) will have a zero column so that their determinant vanishes. The matrices \(\mathbf{M}_k\) without a zero column must have exactly one possibly nonzero entry in each row and each column. We can thus write the matrices \(\mathbf{M}_k\) with possibly nonzero determinant as \[\mathbf{M}_k=\sum_{i=1}^n A_{\sigma(i)i}\mathbf{E}_{\sigma(i),i}\] for some permutation \(\sigma \in S_n.\) Every permutation of \(\{1,\ldots,n\}\) occurs precisely once in the expansion (10.2), hence we can write \[\det \mathbf{A}=\sum_{\sigma \in S_n}\det\left(\sum_{i=1}^nA_{\sigma(i)i}\mathbf{E}_{\sigma(i),i}\right),\] where the notation \(\sum_{\sigma \in S_n}\) means that we sum over all possible permutations of \(\{1,\ldots,n\}.\) We will next write the matrix \(\sum_{i=1}^n A_{\sigma(i)i}\mathbf{E}_{\sigma(i),i}\) differently. To this end notice that for all \(\sigma \in S_n,\) the permutation matrix \(\mathbf{P}_{\sigma}\) can be written as \(\mathbf{P}_{\sigma}=\sum_{i=1}^n \mathbf{E}_{\sigma(i),i}.\) Furthermore, the diagonal matrix \[\mathbf{D}_{\sigma}=\begin{pmatrix} A_{\sigma(1)1}& & & \\ & A_{\sigma(2)2} & & \\ && \ddots & & \\ &&& A_{\sigma(n)n}\end{pmatrix}\] can be written as \(\mathbf{D}_{\sigma}=\sum_{j=1}^n A_{\sigma(j)j}\mathbf{E}_{j,j}.\) Therefore, we obtain \[\mathbf{P}_{\sigma}\mathbf{D}_{\sigma}=\sum_{i=1}^n \mathbf{E}_{\sigma(i),i}\sum_{j=1}^n A_{\sigma(j)j}\mathbf{E}_{j,j}=\sum_{i=1}^n\sum_{j=1}^n A_{\sigma(j)j}\mathbf{E}_{\sigma(i),i}\mathbf{E}_{j,j}=\sum_{i=1}^n A_{\sigma(i)i}\mathbf{E}_{\sigma(i),i},\] We thus have the formula \[\det \mathbf{A}=\sum_{\sigma \in S_n} \det\left(\mathbf{P}_{\sigma}\mathbf{D}_{\sigma}\right)=\sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\det(\mathbf{D}_\sigma),\] where we use the product rule Proposition 10.1 and the definition of the signature of a permutation. By Proposition 10.4, the determinant of a diagonal matrix is the product of its diagonal entries, hence we obtain \[\det \mathbf{A}=\sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n A_{\sigma(i)i}.\] Finally, writing \(\pi=\sigma^{-1},\) we have \[\prod_{i=1}^n A_{\sigma(i)i}=\prod_{j=1}^n A_{j\pi(j)}.\] We have thus shown:
Let \(n \in \mathbb{N}\) and \(\mathbf{A}=(A_{ij})_{1,\leqslant i,j\leqslant n} \in M_{n,n}(\mathbb{K}).\) Then we have \[\tag{10.3} \det(\mathbf{A}) =\sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n A_{\sigma(i)i}=\sum_{\pi \in S_n}\operatorname{sgn}(\pi)\prod_{j=1}^n A_{j\pi(j)}.\]
For \(n=3\) we have six permutations \[\begin{aligned} \boldsymbol{\sigma}_1&=\begin{pmatrix} 1 & 2 & 3 \\ 1 & 2 & 3 \end{pmatrix}, \quad \boldsymbol{\sigma}_2=\begin{pmatrix} 1 & 2 & 3 \\ 1 & 3 & 2 \end{pmatrix}, \quad \boldsymbol{\sigma}_3=\begin{pmatrix} 1 & 2 & 3 \\ 2 & 1 & 3 \end{pmatrix}\\ \boldsymbol{\sigma}_4&=\begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 1 \end{pmatrix}, \quad \boldsymbol{\sigma}_5=\begin{pmatrix} 1 & 2 & 3 \\ 3 & 1 & 2 \end{pmatrix}, \quad \boldsymbol{\sigma}_6=\begin{pmatrix} 1 & 2 & 3 \\ 3 & 2 & 1 \end{pmatrix}. \end{aligned}\] For \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant 3} \in M_{3,3}(\mathbb{K}),\) the Leibniz formula gives \[\begin{gathered} \det(\mathbf{A})=\operatorname{sgn}(\sigma_1)A_{11}A_{22}A_{33}+\operatorname{sgn}(\sigma_2)A_{11}A_{23}A_{32}+\operatorname{sgn}(\sigma_3)A_{12}A_{21}A_{33}\\ +\operatorname{sgn}(\sigma_4)A_{12}A_{23}A_{31}+\operatorname{sgn}(\sigma_5)A_{13}A_{21}A_{32}+\operatorname{sgn}(\sigma_6)A_{13}A_{22}A_{31}, \end{gathered}\] so that in agreement with Example 9.20, we obtain \[\begin{gathered} \det \mathbf{A}=A_{11}A_{22}A_{33}-A_{11}A_{23}A_{32}-A_{12}A_{21}A_{33}\\+A_{12}A_{23}A_{31}+A_{13}A_{21}A_{32}-A_{13}A_{22}A_{31}. \end{gathered}\]
It follows from Leibniz’ formula that \(\det(\mathbf{A}) = \det(\mathbf{A}^T)\) (see Exercise 10.3 below). This has the following important consequences:
the determinant is also multilinear and alternating, when thought of as a map \((\mathbb{K}^n)^n \to \mathbb{K},\) that is, when taking \(n\) columns vectors as an input. In particular, the determinant is also linear in each column;
the Laplace expansion is also valid if we expand with respect to a row, that is, for \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) and \(1\leqslant l \leqslant n,\) we have \[\det(\mathbf{A})=\sum_{k=1}^n(-1)^{k+l}[\mathbf{A}]_{lk}\det\left(\mathbf{A}^{(l,k)}\right).\]
(\(\heartsuit\) – not examinable). For \(n \in \mathbb{N}\) and a vector \(\vec{x}=(x_i)_{1\leqslant i\leqslant n}\in \mathbb{K}^n\) we can form a matrix \(\mathbf{V}_{\vec{x}}=(V_{ij})_{1\leqslant i,j\leqslant n} \in M_{n,n}(\mathbb{K})\) with \(V_{ij}=x_i^{j-1},\) that is, \[\mathbf{V}_{\vec{x}}=\begin{pmatrix} 1 & x_1 & (x_1)^2 & \cdots & (x_1)^{n-1} \\ 1 & x_2 & (x_2)^2 & \cdots & (x_2)^{n-1} \\ 1 & x_3 & (x_3)^2 & \cdots & (x_3)^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_n & (x_n)^2 & \cdots & (x_n)^{n-1}\end{pmatrix}.\] Such matrices are known as Vandermonde matrices and the determinant of a Vandermonde matrix is known as a Vandermonde determinant, they satisfy \[\det(\mathbf{V}_{\vec{x}})=\prod_{1\leqslant i<j \leqslant n}(x_j-x_i).\]
Sketch of a proof. We can define a function \(f : \mathbb{K}^n \to \mathbb{K},\) \(\vec{x} \mapsto \det(\mathbf{V}_{\vec{x}}).\) By the Leibniz formula, the function \(f\) is a polynomial in the variables \(x_i\) with integer coefficients. If we freeze all variables of \(f\) except the \(\ell\)-th variable, then we obtain a function \(g_{\ell} : \mathbb{K}\to \mathbb{K}\) of one variable \(x_{\ell}.\) For \(1\leqslant i\leqslant n\) with \(i\neq \ell\) we have \(g_{\ell}(x_i)=0,\) since we compute the determinant of a matrix with two identical rows, the \(\ell\)-th row and the \(i\)-th row. Factoring the zeros, we can thus write \(g_{\ell}(x_{\ell})=q_{\ell}(x_{\ell})\prod_{1\leqslant i\leqslant n, i\neq \ell}(x_{\ell}-x_i)\) for some polynomial \(q_{\ell}.\) We can repeat this argument for all \(\ell\) and hence can write \(\det(\mathbf{V}_{\vec{x}})=q(\vec{x})\prod_{1\leqslant i<j\leqslant n}(x_j-x_i)\) for some polynomial \(q(\vec{x}).\)
On the other hand, if we multiply all the \(x_i\) by a constant \(s \in \mathbb{K},\) the determinant multiplies by \(s^{1 + 2 + \dots + (n-1)} = s^{n(n-1)/2}.\) The product \(\prod_{1\leqslant i<j\leqslant n}(x_j-x_i)\) has the same scaling behaviour (since it has \(n(n-1)/2\) linear factors); so \(q\) must be invariant under scaling the variables. This implies that \(q\) has to be constant.
Using the Leibniz formula, we see that the summand of \(\det(\mathbf{V}_{\vec{x}})\) corresponding to the identity permutation is the product of the diagonal entries of \(\mathbf{V}_{\vec{x}},\) that is, \(x_2(x_3)^2\cdots(x_n)^{n-1}\) (and no other term in the sum has this combination of exponents). Taking the first term in all factors of \(\prod_{1\leqslant i<j\leqslant n}(x_j-x_i),\) we also obtain \(x_2(x_3)^2\cdots(x_n)^{n-1}\) (and no other term in the product has these exponents). Hence the constant \(q\) must be 1, and so \(\det(\mathbf{V}_{\vec{x}})=\prod_{1\leqslant i<j\leqslant n}(x_j-x_i),\) as claimed.
10.4 Cramer’s rule
The determinant can be used to give a formula for the solution of a linear system of equations of the form \(\mathbf{A}\vec{x}=\vec{b}\) for an invertible matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) \(\vec{b} \in \mathbb{K}^n\) and unknowns \(\vec{x} \in \mathbb{K}^n.\) This formula is often referred to as Cramer’s rule. In order to derive it we start with definitions:
Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) be a square matrix. The adjugate matrix of \(\mathbf{A}\) is the \(n\times n\)-matrix \(\operatorname{Adj}(\mathbf{A})\) whose entries are given by (notice the reverse order of \(i\) and \(j\) on the right hand side) \[[\operatorname{Adj}(\mathbf{A})]_{ij}=(-1)^{i+j}\det\left(\mathbf{A}^{(j,i)}\right), \qquad 1\leqslant i,j\leqslant n.\]
\[\operatorname{Adj}\left(\begin{pmatrix} a & b \\ c & d \end{pmatrix}\right)=\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}, \quad \operatorname{Adj}\left(\begin{pmatrix} 1 & 1 & 2 \\ 0 & 2 & 1 \\ 1 & 0 & 2 \end{pmatrix}\right)=\begin{pmatrix} 4 & -2 & -3 \\ 1 & 0 & -1 \\ -2 & 1 & 2 \end{pmatrix}\]
The determinant and the adjugate matrix provide a formula for the inverse of a matrix:
Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\) Then we have \[\operatorname{Adj}(\mathbf{A})\mathbf{A}=\mathbf{A}\operatorname{Adj}(\mathbf{A})=\det(\mathbf{A})\mathbf{1}_{n}.\] In particular, if \(\mathbf{A}\) is invertible then \[\mathbf{A}^{-1}=\frac{1}{\det \mathbf{A}}\operatorname{Adj}(\mathbf{A}).\]
Proof. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n}.\) For \(1\leqslant i\leqslant n\) we obtain for the \(i\)-th diagonal entry \[[\operatorname{Adj}(\mathbf{A})\mathbf{A}]_{ii}=\sum_{k=1}^n(-1)^{i+k}\det\left(\mathbf{A}^{(k,i)}\right)A_{ki}=\det(\mathbf{A}),\] where we use the Laplace expansion (9.5) of the determinant. The diagonal entries of \(\operatorname{Adj}(\mathbf{A})\mathbf{A}\) are thus all equal to \(\det \mathbf{A}.\) For \(1\leqslant i,j\leqslant n\) with \(i\neq j\) we have \[[\operatorname{Adj}(\mathbf{A})\mathbf{A}]_{ij}=\sum_{k=1}^n(-1)^{i+k}\left(\det \mathbf{A}^{(k,i)}\right)A_{kj}.\] We would like to interpret this last expression as a Laplace expansion. We consider a new matrix \(\hat{\mathbf{A}}=(\hat{A}_{ij})_{1\leqslant i,j\leqslant n}\) which is identical to \(\mathbf{A},\) except that the \(i\)-th column of \(\mathbf{A}\) is replaced with the \(j\)-th column of \(\mathbf{A},\) that is, for \(1\leqslant k\leqslant n,\) we have \[\tag{10.4} \hat{A}_{kl}=\left\{\begin{array}{cc} A_{kj}, & l=i,\\ A_{kl}, & l\neq i.\end{array}\right.\] Then, for all \(1\leqslant k \leqslant n\) we have \(\hat{\mathbf{A}}^{(k,i)}=\mathbf{A}^{(k,i)},\) since the only column in which \(\mathbf{A}\) and \(\hat{\mathbf{A}}\) are different is removed in \(\mathbf{A}^{(k,i)}.\) Using (10.4), the Laplace expansion of \(\hat{\mathbf{A}}\) with respect to the \(i\)-th column gives \[\begin{aligned} \det\hat{\mathbf{A}}&=\sum_{k=1}^n(-1)^{(i+k)}\hat{A}_{ki}\det\left(\hat{\mathbf{A}}^{(k,i)}\right)=\sum_{k=1}^n(-1)^{i+k}\left(\det \mathbf{A}^{(k,i)}\right)A_{kj}\\ &=[\operatorname{Adj}(\mathbf{A})\mathbf{A}]_{ij} \end{aligned}\] The matrix \(\hat{\mathbf{A}}\) has a double occurrence of the \(i\)-th column, hence its column vectors are linearly dependent. Therefore \(\hat{\mathbf{A}}\) is not invertible by Proposition 3.18 and so \(\det\hat{\mathbf{A}}=[\operatorname{Adj}(\mathbf{A})\mathbf{A}]_{ij}=0\) by Corollary 10.2. The off-diagonal entries of \(\operatorname{Adj}(\mathbf{A})\mathbf{A}\) are thus all zero and we conclude \(\operatorname{Adj}(\mathbf{A})\mathbf{A}=\det(\mathbf{A})\mathbf{1}_{n}.\) Using the row version of the Laplace expansion we can conclude analogously that \(\mathbf{A}\operatorname{Adj}(\mathbf{A})=\det(\mathbf{A})\mathbf{1}_{n}.\)
Finally, if \(\mathbf{A}\) is invertible, then \(\det \mathbf{A}\neq 0\) by Corollary 10.2, so that \(\mathbf{A}^{-1}=\operatorname{Adj}(\mathbf{A})/\det(\mathbf{A}),\) as claimed.
We now use Theorem 10.25 to obtain a formula for the solution of the linear system \(\mathbf{A}\vec{x}=\vec{b}\) for an invertible matrix \(\mathbf{A}.\) Multiplying from the left with \(\mathbf{A}^{-1},\) we get \[\vec{x}=\mathbf{A}^{-1}\vec{b}=\frac{1}{\det \mathbf{A}}\operatorname{Adj}(\mathbf{A})\vec{b}.\] Writing \(\vec{x}=(x_i)_{1\leqslant i\leqslant n},\) multiplication with \(\det \mathbf{A}\) gives for \(1\leqslant i \leqslant n\) \[x_i\det \mathbf{A}=\sum_{k=1}^n[\operatorname{Adj}(\mathbf{A})]_{ik}b_k=\sum_{k=1}^n(-1)^{i+k}\det\left(\mathbf{A}^{(k,i)}\right)b_k.\] We can again interpret the right hand side as a Laplace expansion of the matrix \(\hat{\mathbf{A}}_i\) obtained by replacing the \(i\)-th column of \(\mathbf{A}\) with \(\vec{b}\) and leaving \(\mathbf{A}\) unchanged otherwise. Hence, we have for all \(1\leqslant i\leqslant n\) \[x_i=\frac{\det \hat{\mathbf{A}}_i}{\det \mathbf{A}}.\] This formula is known as Cramer’s rule. While this is a neat formula, it is rarely used in computing solutions to linear systems of equations due to the complexity of computing determinants.
We consider the system \(\mathbf{A}\vec{x}=\vec{b}\) for \[\mathbf{A}=\begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}\qquad \text{and} \qquad \vec{b}=\begin{pmatrix} -2 \\ 2 \\ 4 \end{pmatrix}.\] Here we obtain \[\hat{\mathbf{A}}_1=\begin{pmatrix} -2 & 1 & 1 \\ 2 & 2 & 1 \\ 4 & 1 & 2 \end{pmatrix}, \quad \hat{\mathbf{A}}_2=\begin{pmatrix} 2 & -2 & 1 \\ 1 & 2 & 1 \\ 1 & 4 & 2 \end{pmatrix}, \quad \hat{\mathbf{A}}_3=\begin{pmatrix} 2 & 1 & -2 \\ 1 & 2 & 2 \\ 1 & 1 & 4 \end{pmatrix}.\] We compute \(\det\mathbf{A}=4,\) \(\det\hat{\mathbf{A}}_1=-12,\) \(\det\hat{\mathbf{A}}_2=4\) and \(\det\hat{\mathbf{A}}_3=12\) so that Cramer’s rule gives indeed the correct solution \[\vec{x}=\frac{1}{4}\begin{pmatrix} -12 \\ 4 \\ 12 \end{pmatrix}=\begin{pmatrix} -3 \\ 1 \\ 3\end{pmatrix}.\]
Exercises
Let \[\sigma = \left(\begin{array}{rrrrr} 1 & 2 & 3 & 4 & 5 \\ 5 & 3 & 2 & 4 & 1 \end{array}\right),\qquad \tau = \left(\begin{array}{rrrrr} 1 & 2 & 3 & 4 & 5 \\ 2 & 5 & 4 & 3 & 1 \end{array}\right).\]
Compute \(\sigma \tau.\)
Express \(\sigma\) and \(\tau\) products of transpositions.
(Sign of an \(n\)-cycle). Show that the permutation \(\sigma_n \in S_n\) defined by \(\sigma_n(i) = i + 1\) for \(1 \leqslant i \leqslant n-1,\) and \(\sigma_n(n) = 1,\) has \(\operatorname{sgn}(\sigma_n) = (-1)^{n + 1}.\)
Use the Leibniz formula to show that \[\det(\mathbf{A})=\det(\mathbf{A}^T)\] for all \(n\) and all \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\)
Solution
Let \(\mathbf{A}= (A_{ij})_{1\leqslant i,j\leqslant n}.\) We first note that \(\operatorname{sgn}(\sigma) = \operatorname{sgn}(\sigma^{-1})\) for all \(\sigma\in S_n,\) which immediately follows from \(\operatorname{sgn}(\sigma)\operatorname{sgn}(\sigma^{-1})=\operatorname{sgn}(\sigma\circ\sigma^{-1}) = \operatorname{sgn}(1) = 1.\) The Leibniz formula reads \[\det(\mathbf{A}) = \sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n A_{\sigma(i)i}.\] We write \(\pi=\sigma^{-1}\) and let \(j=\sigma(i).\) Summing over all permutations is the same as summing over all inverse permutations and hence we find \[\begin{aligned} \det(\mathbf{A}) & = \sum_{\sigma \in S_n}\operatorname{sgn}(\sigma)\prod_{j=1}^n A_{j\pi(j)}\\ & = \sum_{\pi\in S_n}\operatorname{sgn}(\pi)\prod_{j=1}^n A_{j\pi(j)}\\ & = \sum_{\pi\in S_n}\operatorname{sgn}(\pi)\prod_{j=1}^n [\mathbf{A}^T]_{\pi(j)j}\\ & = \det(\mathbf{A}^T). \end{aligned}\]
Let \(\mathbf{A}= \begin{pmatrix} -1 & -2 & 11 \\ 1 & -1 & -1 \\ 6 & 0 & 0 \end{pmatrix}.\) Compute \(\operatorname{Adj}\mathbf{A},\) and hence \(\det \mathbf{A}.\)
If \(\sigma\) is a permutation, then \(\mathbf P_\sigma^{-1}=\mathbf P_{\sigma}.\)
- True
- False
If \(\sigma\in S_n\) then \(\sigma^n = 1.\)
- True
- False
If \(\sigma,\tau\in S_n\) then \(\sigma^n = \tau^n\) implies \(\sigma=\tau.\)
- True
- False
If \(\sigma,\tau\in S_n\) then \(\sigma\circ\tau\) implies \(\tau\circ\sigma.\)
- True
- False
It holds that \(|\det(\mathbf{A})|=\sqrt{\det(\mathbf{A}^T\mathbf{A})}\) for all \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\)
- True
- False
Let \(\mathbf{A}\in M_{2n+1,2n+1}(\mathbb{K})\) be anti-symmetric. Then \(\det(\mathbf{A})=0.\)
- True
- False
No anti-symmetric matrix \(\mathbf{A}\) is invertible.
- True
- False
If \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is such that \(\mathbf{A}^T\mathbf{A}=\mathbf{1}_{n},\) then \(\det(\mathbf{A})=1.\)
- True
- False
A matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is invertible if and only if \(\operatorname{Adj}(\mathbf{A})\) is.
- True
- False
Given \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K}),\) then \(\det(\mathbf{A})\det(\mathbf{B})=\det(\mathbf{B})\det(\mathbf{A}).\)
- True
- False
Given \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K}),\) then \(\det(\mathbf{A}\mathbf{B})=\det(\mathbf{A})+\det(\mathbf{B}).\)
- True
- False
A square matrix is non-invertible if and only if its transpose is non-invertible.
- True
- False
The matrix \(\begin{pmatrix}a & b\\ 5 & b\end{pmatrix}\) is invertible if and only if \(a\ne5\) and \(b\ne0.\)
- True
- False
The matrix \(\begin{pmatrix}x+\mathrm i & 0\\ 0 & x-\mathrm i\end{pmatrix}\) is invertible for all \(x\in\mathbb{R}.\)
- True
- False
If \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) where \(m\ne n,\) then \(\det(\mathbf{A}^T\mathbf{A})=\det(\mathbf{A}\mathbf{A}^T).\)
- True
- False
It holds that \(\operatorname{Adj}(\mathbf{A}^T)=\operatorname{Adj}(\mathbf{A})^T\) for all \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\)
- True
- False
If \(\mathbf{B}\in M_{n,n}(\mathbb{K})\) is the reduced row echelon form of \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) then \(\det(\mathbf{A})=\det(\mathbf{B}).\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\) is invertible, then \(\det(\mathbf{B}\mathbf{A}\mathbf{B}^{-1})=\det(\mathbf{A}).\)
- True
- False