2 Matrices, I
In this chapter we’ll recall some things you learned in “Algorithmics” about matrices and vectors, and learn some new properties.
Throughout the rest of this module, \(\mathbb{K}\) stands for an arbitrary field. It won’t ever matter which field it is; so you can assume \(\mathbb{K}= \mathbb{R}\) (or \(\mathbb{C}\)) throughout if you prefer.
2.1 Definitions
We start with some definitions. In this chapter, \(m, n, p, q, r\) denote natural numbers.
A matrix (plural matrices) is simply a rectangular block of numbers. More precisely:
A rectangular block of scalars \(A_{ij} \in \mathbb{K},\) \(1\leqslant i\leqslant m,1\leqslant j\leqslant n\) \[\tag{2.1} \mathbf{A}=\begin{pmatrix} A_{11} & A_{12} & \cdots & A_{1n} \\ A_{21} & A_{22} & \cdots & A_{2n} \\ \vdots & & \ddots & \vdots \\ A_{m1} & A_{m2} & \cdots & A_{mn}\end{pmatrix}\] is called an \(m\times n\) matrix with entries in \(\mathbb{K}.\)
We also say that \(\mathbf{A}\) is an \(m\)-by-\(n\) matrix, that \(\mathbf{A}\) has size \(m\times n\) and that \(\mathbf{A}\) has \(m\) rows and \(n\) columns.
The entry \(A_{ij}\) of \(\mathbf{A}\) is said to have row index \(i\) where \(1\leqslant i\leqslant m,\) column index \(j\) where \(1\leqslant j\leqslant n\) and will be referred to as the \((i,j)\)-th entry of \(\mathbf{A}.\)
A shorthand notation for (2.1) is \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.\)
For matrices \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}\) and \(\mathbf{B}=(B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}\) we write \(\mathbf{A}=\mathbf{B},\) provided \(A_{ij}=B_{ij}\) for all \(1\leqslant i\leqslant m\) and all \(1\leqslant j \leqslant n.\)
The set of \(m\)-by-\(n\) matrices with entries in \(\mathbb{K}\) will be denoted by \(M_{m,n}(\mathbb{K}).\)
The elements of the set \(M_{m,1}(\mathbb{K})\) are called column vectors of length \(m\) and the elements of the set \(M_{1,n}(\mathbb{K})\) are called row vectors of length \(n\).
We will use the Latin alphabet for column vectors and decorate them with an arrow. For a column vector \[\vec{x}=\begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_m\end{pmatrix} \in M_{m,1}(\mathbb{K})\] we also use the shorthand notation \(\vec{x}=(x_i)_{1\leqslant i\leqslant m}\) and we write \([\vec{x}]_i\) for the \(i\)-th entry of \(\vec{x},\) so that \([\vec{x}]_i=x_i\) for all \(1\leqslant i\leqslant m.\)
We will use the Greek alphabet for row vectors and decorate them with an arrow. For a row vector \[\vec{\xi}=\begin{pmatrix} \xi_1 & \xi_2 & \cdots & \xi_n\end{pmatrix} \in M_{1,n}(\mathbb{K})\] we also use the shorthand notation \(\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}\) and we write \([\vec{\xi}]_i\) for the \(i\)-th entry of \(\vec{\xi},\) so that \([\vec{\xi}]_i=\xi_i\) for all \(1\leqslant i\leqslant n.\)
A matrix is always denoted by a bold capital letter, such as \(\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D}.\)
The entries of the matrix are denoted by \(A_{ij},B_{ij},C_{ij},D_{ij},\) respectively.
We may think of an \(m\times n\) matrix as consisting of \(n\) column vectors of length \(m.\) The column vectors of the matrix are denoted by \(\vec{a}_i,\vec{b}_i,\vec{c}_i,\vec{d}_i,\) respectively.
We may think of an \(m\times n\) matrix as consisting of \(m\) row vectors of length \(n.\) The row vectors of the matrix are denoted by \(\vec{\alpha}_i,\vec{\beta}_i,\vec{\gamma}_i,\vec{\delta}_i,\) respectively.
For a matrix \(\mathbf{A}\) we also write \([\mathbf{A}]_{ij}\) for the \((i,j)\)-th entry of \(\mathbf{A}.\) So for \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},\) we have \([\mathbf{A}]_{ij}=A_{ij}\) for all \(1\leqslant i\leqslant m, 1\leqslant j \leqslant n.\)
For \[\mathbf{A}=\begin{pmatrix} \pi & \sqrt{2} \\ -1 & 5/3 \\ \log 2 & 3 \end{pmatrix} \in M_{3,2}(\mathbb{R}),\] we have for instance \([\mathbf{A}]_{32}=3,\) \([\mathbf{A}]_{12}=\sqrt{2},\) \([\mathbf{A}]_{21}=-1\) and \[\vec{a}_1=\begin{pmatrix} \pi \\ -1 \\ \log 2 \end{pmatrix},\quad \vec{a}_2=\begin{pmatrix} \sqrt{2} \\ 5/3\\ 3 \end{pmatrix}, \quad \vec{\alpha}_2=\begin{pmatrix} -1 & 5/3 \end{pmatrix},\quad \vec{\alpha}_3=\begin{pmatrix} \log 2 && 3 \end{pmatrix}.\]
We’ll use the shorthand notation \(\mathbb{K}^n\) for column vectors with \(n\) entries (i.e. \(M_{n, 1}(\mathbb{K})\)), and \(\mathbb{K}_n\) for row vectors \(M_{1, n}(\mathbb{K}).\) We can of course go back and forth between them, since there’s a bijective map from \(\mathbb{K}_n\) to \(\mathbb{K}^n\) sending \((x_1\ \dots\ x_n)\) to \(\begin{pmatrix} x_1 \\ \vdots\\ x_n\end{pmatrix}\); but it’s helpful to think of them as separate, since they will interact differently with matrix multiplication.
The zero matrix \(\mathbf{0}_{m,n}\) is the \(m\times n\) matrix whose entries are all zero. We will also write \(\mathbf{0}_n\) for the \(n\times n\)-matrix whose entries are all zero.
Matrices with equal number \(n\) of rows and columns are known as square matrices.
An entry \(A_{ij}\) of a square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is said to be a diagonal entry if \(i=j\) and an off-diagonal entry otherwise. A matrix whose off-diagonal entries are all zero is said to be diagonal. (The notion of “diagonal” sort of makes sense for non-square matrices too, but it’s most useful in the diagonal case.)
We write \(\mathbf{1}_{n}\) for the diagonal \(n\times n\) matrix whose diagonal entries are all equal to \(1.\) Using the so-called Kronecker delta defined by the rule \[\delta_{ij}=\left\{\begin{array}{cc} 1 & i=j, \\ 0 & i\neq j,\end{array}\right.\] we have \([\mathbf{1}_{n}]_{ij}=\delta_{ij}\) for all \(1\leqslant i,j\leqslant n.\) The matrix \(\mathbf{1}_{n}\) is called the unit matrix or identity matrix of size \(n.\)
The standard basis of \(\mathbb{K}^n\) is the set \(\{\vec{e}_1,\vec{e}_2,\ldots,\vec{e}_n\}\) consisting of the column vectors of the identity matrix \(\mathbf{1}_{n}\) of size \(n.\)
The standard basis of \(\mathbb{K}_n\) is the set \(\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\ldots,\vec{\varepsilon}_n\}\) consisting of the row vectors of the identity matrix \(\mathbf{1}_{n}\) of size \(n.\)
Special matrices: \[\mathbf{0}_{2,3}=\begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, \qquad \mathbf{1}_{2}=\begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}, \quad \qquad \mathbf{1}_{3}=\begin{pmatrix} 1 & 0 &0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}.\]
The standard basis of \(\mathbb{K}^3\) is \(\{\vec{e}_1,\vec{e}_2,\vec{e}_3\},\) where \[\vec{e}_1=\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \quad \vec{e}_2=\begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \quad \text{and} \quad \vec{e}_3=\begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.\]
The standard basis of \(\mathbb{K}_3\) is \(\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\vec{\varepsilon}_3\},\) where \[\vec{\varepsilon}_1=\begin{pmatrix} 1 & 0 & 0 \end{pmatrix}, \quad \vec{\varepsilon}_2=\begin{pmatrix} 0 & 1 & 0 \end{pmatrix} \quad \text{and} \quad \vec{\varepsilon}_3=\begin{pmatrix} 0 & 0 & 1 \end{pmatrix}.\]
2.2 Arithmetic with matrices
Matrix addition
The sum of matrices \(\mathbf{A}\) and \(\mathbf{B}\) of identical size is defined as follows:
Addition in \(M_{m,n}(\mathbb{K})\) is the map \[+_{M_{m,n}(\mathbb{K})} : M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (\mathbf{A},\mathbf{B})\mapsto \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}\] defined by the rule \[\tag{2.2} \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}=(A_{ij}+_{\mathbb{K}}B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where \(A_{ij}+_{\mathbb{K}}B_{ij}\) denotes the field addition of scalars \(A_{ij},B_{ij}\in \mathbb{K}.\)
Field addition takes two scalars and produces another scalar, thus it is a map \(\mathbb{K}\times \mathbb{K}\to \mathbb{K},\) whereas addition of matrices is a map \(M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}).\) For this reason we wrote \(+_{M_{m,n}(\mathbb{K})}\) above in order to distinguish matrix addition from field addition of scalars. Of course, it is quite cumbersome to always write \(+_{M_{m,n}(\mathbb{K})}\) and \(+_{\mathbb{K}},\) so we follow the usual custom of writing “\(+\)” both for field addition of scalars and for matrix addition, trusting that the reader is aware of the difference.
(Similarly, the notations \(\vec{\varepsilon}_1\) etc for standard basis vectors are slightly ambiguous since we haven’t specified \(n,\) but \((1\quad 0)\) and \((1 \quad 0 \quad 0)\) are not literally the same vector. Whenever we use these notations it will always be clear from context what \(n\) is.)
Scalar multiplication of a matrix
We can multiply a matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) with a scalar \(s \in \mathbb{K}.\) This amounts to multiplying each entry of \(\mathbf{A}\) with \(s\):
Scalar multiplication in \(M_{m,n}(\mathbb{K})\) is the map \[\cdot_{M_{m,n}(\mathbb{K})} : \mathbb{K}\times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (s,\mathbf{A}) \mapsto s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}\] defined by the rule \[\tag{2.3} s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=(s\cdot_{\mathbb{K}} A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where \(s\cdot_{\mathbb{K}} A_{ij}\) denotes the field multiplication of scalars \(s,A_{ij} \in \mathbb{K}.\)
Here we multiply with \(s\) from the left. Likewise, we define \(\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s=(A_{ij}\cdot_{\mathbb{K}} s)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},\) that is, we multiply from the right. Of course, since multiplication of scalars is commutative, we have \(s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s,\) that is, left multiplication and right multiplication gives the same matrix. However, in a moment we’ll encounter a more general kind of multiplication where this isn’t true.
Multiplication of a matrix by a scalar: \[5\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}5=\begin{pmatrix} 5\cdot 1 & 5\cdot 2 \\ 5\cdot 3 & 5\cdot 4 \end{pmatrix}=\begin{pmatrix} 5 & 10 \\ 15 & 20 \end{pmatrix}.\]
Addition of matrices:
\[\begin{pmatrix} 3 & -5 \\ -2 & 8 \end{pmatrix} + \begin{pmatrix} -3 & 8 \\ 7 & 10 \end{pmatrix}=\begin{pmatrix} 0 & 3 \\ 5 & 18 \end{pmatrix}.\]
In particular, since row vectors and column vectors are just special types of matrix, we can multiply those by scalars; and we have the following easy but useful lemma:
For any row vector \(\vec{\xi} = (\xi_1\quad \xi_2\quad\dots\quad \xi_n) \in \mathbb{K}_n\) we have \(\vec{\xi} = \sum_{j = 1}^n\xi_j \vec{\varepsilon}_j,\) and similarly for column vectors.
Proof. Clear.
Matrix multiplication
If the number of columns of a matrix \(\mathbf{A}\) is equal to the number of rows of a matrix \(\mathbf{B},\) we define the matrix product \(\mathbf{A}\mathbf{B}\) of \(\mathbf{A}\) and \(\mathbf{B}\) as follows:
Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) be an \(m\)-by-\(n\) matrix and \(\mathbf{B}\in M_{n,r}(\mathbb{K})\) be an \(n\)-by-\(r\) matrix. The matrix product of \(\mathbf{A}\) and \(\mathbf{B}\) is the \(m\)-by-\(r\) matrix \(\mathbf{A}\mathbf{B}\in M_{m,r}(\mathbb{K})\) whose entries are defined by the rule \[[\mathbf{A}\mathbf{B}]_{ik}=A_{i1}B_{1k}+A_{i2}B_{2k}+\cdots +A_{in}B_{nk}=\sum_{j =1}^nA_{ij}B_{jk}=\sum_{j=1}^n [\mathbf{A}]_{ij}[\mathbf{B}]_{jk}.\] for all \(1\leqslant i\leqslant m\) and all \(1\leqslant k\leqslant r.\)
This definition might seem a little weird and arbitrary at first sight, but will turn out to give us a nice theory. (Perhaps the best motivation for it will come much later in the module, in Theorem 7.6.)
If \(\mathbf{A}\) is a \(m\)-by-\(n\) matrix and \(\mathbf{B}\) a \(n\)-by-\(m\) matrix, then both \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are defined, but in general \(\mathbf{A}\mathbf{B}\neq \mathbf{B}\mathbf{A}.\) In general \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) aren’t even the same size, since \(\mathbf{A}\mathbf{B}\) is an \(m\)-by-\(m\) matrix and \(\mathbf{B}\mathbf{A}\) is an \(n\)-by-\(n\) matrix. Even when \(n=m,\) so that \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are the same size, it is still false in general that \(\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}.\)
We may define a pairing \(\mathbb{K}_n \times \mathbb{K}^n \to \mathbb{K}\) of a row vector of length \(n\) and a column vector of length \(n\) by the rule \[(\vec{\xi},\vec{x})\mapsto \vec{\xi}\cdot \vec{x}=\xi_1x_1+\xi_2x_2+\cdots+\xi_n x_n\] for all \(\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}\in \mathbb{K}_n\) and for all \(\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{K}^n.\) So we multiply the first entry of \(\vec{\xi}\) with the first entry of \(\vec{x},\) add the product of the second entry of \(\vec{\xi}\) and the second entry of \(\vec{x}\) and continue in this fashion until the last entry of \(\vec{\xi}\) and \(\vec{x}.\)
The \((i,j)\)-th entry of the matrix product of \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,r}(\mathbb{K})\) is then given by the pairing \[[\mathbf{A}\mathbf{B}]_{ij}=\vec{\alpha}_i\vec{b}_j\] of the \(i\)-th row vector \(\vec{\alpha}_i\) of \(\mathbf{A}\) and the \(j\)-th column vector \(\vec{b}_j\) of \(\mathbf{B}.\)
Properties
Here is a long list of (mostly quite straightforward) properties of these matrix operations. Here \(\mathbb{K}\) is any field, and \(m, n, r, s\) are natural numbers.
\(\mathbf{0}_{m,n}+\mathbf{A}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}\) and \((\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})\) for all \(\mathbf{A},\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K});\)
\(\mathbf{1}_{m}\mathbf{A}=\mathbf{A}\) and \(\mathbf{A}\mathbf{1}_{n}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{0}_{{r},m}\mathbf{A}=\mathbf{0}_{{r},n}\) and \(\mathbf{A}\mathbf{0}_{n,{r}}=\mathbf{0}_{m,{r}}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
(“Associativity of multiplication”) \((\mathbf{A}\mathbf{B}) \mathbf{C}= \mathbf{A}(\mathbf{B}\mathbf{C})\) for all \(\mathbf{A}\in M_{m, p}(\mathbb{K}),\) \(\mathbf{B}\in M_{p, q}(\mathbb{K})\) and \(\mathbf{C}\in M_{q, n}(\mathbb{K})\);
\((\mathbf{B}+\mathbf{C})\mathbf{A}=\mathbf{B}\mathbf{A}+\mathbf{C}\mathbf{A}\) for all \(\mathbf{B},\mathbf{C}\in M_{{r},m}(\mathbb{K})\) and for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}\) for all \(\mathbf{A}\in M_{{r},m}(\mathbb{K})\) and for all \(\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K}).\)
\(0_{\mathbb{K}} \cdot \mathbf{A}=\mathbf{0}_{m,n}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\((s_1s_2)\mathbf{A}=s_1(s_2 \mathbf{A})\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(s_1,s_2 \in \mathbb{K}\);
\(\mathbf{A}(s \mathbf{B})=s(\mathbf{A}\mathbf{B})=(s \mathbf{A})\mathbf{B}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(\mathbf{B}\in M_{n,{r}}(\mathbb{K})\) and all \(s \in \mathbb{K}\);
\(s(\mathbf{A}+\mathbf{B})=s \mathbf{A}+s \mathbf{B}\) for all \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(s \in \mathbb{K}\);
\((s_1+s_2)\mathbf{A}=s_1\mathbf{A}+s_2\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and for all \(s_1,s_2 \in \mathbb{K}.\)
Some of these only involve one of our three matrix operations, and some involve several. (Exercise: make a table showing which properties involve which operations!)
Because of (ii) and (v), we don’t need to write brackets when we deal with sums (or products) of three or more matrices – we can just write \(\mathbf{A}+ \mathbf{B}+ \mathbf{C}\) or \(\mathbf{A}\mathbf{B}\mathbf{C},\) assuming the matrices are of compatible sizes so the operations make sense (and similarly for four or more matrices).
Proof. As a sample, we show properties (iii) and (vii), which are quite easy, and (v), which is slightly harder. The proofs of the remaining ones are similar and/or elementary consequences of the properties of addition and multiplication of scalars.
To show property (iii), consider \(\mathbf{A}\in M_{m,n}(\mathbb{K}).\) Then, by definition, we have for all \(1\leqslant k\leqslant m\) and all \(1\leqslant j\leqslant n\) \[[\mathbf{1}_{m}\mathbf{A}]_{kj}=\sum_{i=1}^m[\mathbf{1}_{m}]_{ki}[\mathbf{A}]_{ij}=\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}=[\mathbf{A}]_{kj},\] where the second last equality uses that \(\delta_{ki}\) is \(0\) unless \(i=k,\) in which case \(\delta_{kk}=1.\) We conclude that \(\mathbf{1}_{m}\mathbf{A}=\mathbf{A}.\) Likewise, we obtain for all \(1\leqslant i\leqslant m\) and all \(1\leqslant k\leqslant n\) \[[\mathbf{A}\mathbf{1}_{n}]_{ik}=\sum_{j=1}^n[\mathbf{A}]_{ij}[\mathbf{1}_{n}]_{jk}=\sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}=[\mathbf{A}]_{ik}\] so that \(\mathbf{A}\mathbf{1}_{n}=\mathbf{A}.\) The identities \[\boxed{\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}\qquad \text{and}\qquad \sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}}\] are used repeatedly in Linear Algebra, so make sure you understand them.
For property (vii), applying the definition of matrix multiplication gives \[\mathbf{A}\mathbf{B}=\left(\sum_{i=1}^m A_{ki}B_{ij}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}\quad \text{and}\quad \mathbf{A}\mathbf{C}=\left(\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n},\] so that \[\begin{aligned} \mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}&=\left(\sum_{i=1}^m A_{ki}B_{i j}+\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}\\&=\left(\sum_{i=1}^m A_{ki}\left(B_{i j}+C_{i j}\right)\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}=\mathbf{A}(\mathbf{B}+\mathbf{C}), \end{aligned}\] where we use that \[\mathbf{B}+\mathbf{C}=\left(B_{ij}+C_{ij}\right)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.\]
For property (v), if \(A_{ij} = [\mathbf{A}]_{ij}\) etc, we have \[[ (\mathbf{A}\mathbf{B}) \mathbf{C}]_{ij} = \sum_{b = 1}^q [(\mathbf{A}\mathbf{B})]_{ib} [\mathbf{C}]_{bj} = \sum_{b = 1}^q \left(\sum_{a = 1}^p A_{ia} B_{ab}\right) C_{b j}\] This is the sum over all possible products \(A_{ia} B_{ab} C_{bj}\) (with \(i, j\) fixed and \(a, b\) varying); we can group together these terms in whichever order we like, so we take the sum over \(a\) to the outside: \[\dots = \sum_{a = 1}^pA_{ia} \left(\sum_{b = 1}^q B_{ab} C_{b j}\right)= \sum_{a = 1}^p[\mathbf{A}]_{ia} [\mathbf{B}\mathbf{C}]_{a j} = [ \mathbf{A}(\mathbf{B}\mathbf{C})]_{i j}\] as required.
2.3 Transpose and inverse
Transpose
Finally, we may flip a matrix along its “diagonal entries”, that is, we interchange the role of rows and columns. More precisely:
The transpose of a matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) is the matrix \(\mathbf{A}^{T} \in M_{n,m}(\mathbb{K})\) satisfying \[\left[\mathbf{A}^T\right]_{ij}=[\mathbf{A}]_{ji}\] for all \(1\leqslant i\leqslant n\) and \(1\leqslant j\leqslant m.\)
A square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) that satisfies \(\mathbf{A}=\mathbf{A}^T\) is called symmetric.
A square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) that satisfies \(\mathbf{A}=-\mathbf{A}^T\) is called anti-symmetric.
If \[\mathbf{A}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6\end{pmatrix}, \quad \text{then} \quad \mathbf{A}^T=\begin{pmatrix} 1 & 3 & 5 \\ 2 & 4 & 6\end{pmatrix}.\]
For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) we have by definition \((\mathbf{A}^T)^T=\mathbf{A}.\)
For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,r}(\mathbb{K}),\) we have \[(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.\] Indeed, by definition we have for all \(1\leqslant i\leqslant r\) and all \(1\leqslant j\leqslant m\) \[\left[(\mathbf{A}\mathbf{B})^T\right]_{ij}=[\mathbf{A}\mathbf{B}]_{ji}=\sum_{k=1}^n [\mathbf{A}]_{jk}[\mathbf{B}]_{ki}=\sum_{k=1}^n\left[\mathbf{B}^T\right]_{ik}\left[\mathbf{A}^T\right]_{kj}=\left[\mathbf{B}^T\mathbf{A}^T\right]_{ij}.\]
Transposes won’t play a very big role in this course, but they will be much more important in Linear Algebra II when you start studying bilinear mappings.
Inverses
Let \(\mathbf{A}\in M_{m, n}(\mathbb{K})\) be a matrix. If there exists a matrix \(\mathbf{B}\in M_{n, m}(\mathbb{K})\) with \(\mathbf{A}\mathbf{B}= \mathbf{1}_{m}\) and \(\mathbf{B}\mathbf{A}= \mathbf{1}_{n},\) then we say \(\mathbf{A}\) is invertible.
If such a matrix exists (for a given \(\mathbf{A}\)) then it’s unique (see Exercises). So we can denote this unique matrix (if it exists!) by \(\mathbf{A}^{-1},\) and we call it the inverse of \(\mathbf{A}.\)
If \(\mathbf{A}\in M_{m, n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n, r}(\mathbb{K})\) are both invertible, then \(\mathbf{A}\mathbf{B}\) is invertible and \((\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.\)
Proof. We compute (using part (v) of Proposition 2.16 repeatedly) that \((\mathbf{B}^{-1} \mathbf{A}^{-1}) (\mathbf{A}\mathbf{B}) = \mathbf{B}^{-1} (\mathbf{A}^{-1} \mathbf{A}) \mathbf{B}= \mathbf{B}^{-1} \mathbf{1}_{n} \mathbf{B}= \mathbf{B}^{-1} \mathbf{B}= \mathbf{1}_{r},\) and similarly \((\mathbf{A}\mathbf{B}) \cdot (\mathbf{B}^{-1} \mathbf{A}^{-1}) = \mathbf{1}_{m}.\)
You saw these definitions already in Algorithmics, assuming \(\mathbb{K}= \mathbb{R}\) and, more importantly, that \(m = n.\) The definition still makes logical sense if \(m \ne n,\) but we’ll see later that it is never satisfied – it is a theorem that a non-square matrix cannot be invertible.
Exercises
See website https://apptest.fernuni.ch/ for worked solutions
Find two \(2 \times 2\) matrices \(\mathbf{A}\) and \(\mathbf{B}\) with \(\mathbf{A}\mathbf{B}\ne \mathbf{B}\mathbf{A}.\)
Solution
Consider \(\mathbf{A}= \left(\begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix}\right)\) and \(\mathbf{B}= \left(\begin{smallmatrix} 1 & 0 \\ 1 & 1 \end{smallmatrix}\right).\) Then we compute that \[\mathbf{A}\mathbf{B}= \left(\begin{smallmatrix}2 & 1 \\ 1 & 1 \end{smallmatrix}\right), \qquad \mathbf{B}\mathbf{A}= \left(\begin{smallmatrix}1 & 1 \\ 1 & 2 \end{smallmatrix}\right).\]
Let \(a,b,c,d \in \mathbb{K}\) and \[\mathbf{A}=\begin{pmatrix} a & b \\ c & d \end{pmatrix} \in M_{2,2}(\mathbb{K}).\] Show that \(\mathbf{A}\) has an inverse \(\mathbf{A}^{-1}\) if and only if \(ad-bc\neq 0.\) For \(ad-bc\neq 0,\) compute the inverse \(\mathbf{A}^{-1}.\)
Solution
If the matrix \(\mathbf{A}\) had an inverse given by \[\mathbf{A}^{-1}=\begin{pmatrix} u & v \\ w & x \end{pmatrix},\] then \(\mathbf{A}\mathbf{A}^{-1}=\mathbf{1}_{2},\) which amounts to the equations \[\begin{aligned} au+bw & = 1, \\ av+bx & = 0,\\ cu+dw & = 0,\\ cv+dx &= 1. \end{aligned}\] Multiplying the first equation by \(c\) and using the third equation leads to \(c=-(ad-bc)w.\) Similarly we find \[\begin{aligned} a& =(ad-bc)x,\\ b & =-(ad-bc)v,\\ c & =-(ad-bc)w,\\ d & =(ad-bc)u. \end{aligned}\] These equations imply that \[\begin{pmatrix}a & b \\ c & d\end{pmatrix}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix} = \begin{pmatrix}ad-bc & 0 \\ 0 & ad-bc\end{pmatrix},\] which shows that \(\mathbf{A}\) is invertible if and only if \(ad-bc\ne 0\) and in this case, the inverse is given by \[\mathbf{A}^{-1}= \frac{1}{ad-bc}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix}.\]
Let \(\mathbf{A}= (1 \quad 0) \in M_{1, 2}(\mathbb{R}).\)
Find a matrix \(\mathbf{B}\in M_{2, 1}(\mathbb{R})\) with \(\mathbf{A}\mathbf{B}= \mathbf{1}_{1}.\)
Is the matrix \(\mathbf{B}\) from (i) uniquely determined?
Show that there does not exist a matrix \(\mathbf{B}\in M_{2, 1}(\mathbb{R})\) with \(\mathbf{B}\mathbf{A}= \mathbf{1}_{2}.\)
(We say that \(\mathbf{A}\) has a right inverse but not a left inverse.)
Solution
The matrix \(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\) will work.
No, because in fact the matrix \(\begin{pmatrix} 1 \\ x \end{pmatrix}\) will work for any value of \(x.\)
If \(\mathbf{B}= \begin{pmatrix} x \\ y \end{pmatrix},\) then we compute \[\mathbf{B}\mathbf{A}= \begin{pmatrix} x & 0 \\ y & 0 \end{pmatrix},\] so the right-hand column of \(\mathbf{B}\mathbf{A}\) is always zero for any \(\mathbf{B}\); hence this matrix cannot be equal to \(\mathbf{1}_{2}.\)
Let \(\mathbf{A}\in M_{m, n}(\mathbb{K}).\) Suppose there exist matrices \(\mathbf{B},\) \(\mathbf{B}' \in M_{n, m}(\mathbb{K})\) such that \(\mathbf{A}\mathbf{B}= \mathbf{1}_{m}\) and \(\mathbf{B}' \mathbf{A}= \mathbf{1}_{n}.\) Show that we must have \(\mathbf{B}= \mathbf{B}'.\) (Hint: Consider \(\mathbf{B}' \mathbf{A}\mathbf{B}.\)) Hence show that the inverse of a matrix is unique if it exists.
Solution
We have \[\begin{aligned} \mathbf{B}&= \mathbf{1}_{n} \mathbf{B}\\ &= (\mathbf{B}' \mathbf{A}) \mathbf{B}\quad\text{(assumption on $\mathbf{B}'$)} \\ &= \mathbf{B}' (\mathbf{A}\mathbf{B})\quad\text{(associativity)} \\ &= \mathbf{B}' \mathbf{1}_{m} \quad\text{(assumption on $\mathbf{B}$)}\\ &= \mathbf{B}'.\quad \text{QED}. \end{aligned}\] In particular, if \(\mathbf{B}\) and \(\mathbf{B}'\) are both inverses of \(\mathbf{A},\) then they satisfy \(\mathbf{A}\mathbf{B}= \mathbf{1}_m\) and \(\mathbf{B}' \mathbf{A}= \mathbf{1}_n,\) so we must have \(\mathbf{B}' = \mathbf{B}.\)
If \(\mathbf A,\mathbf B\in M_{n,n}(\mathbb K),\) then \(\mathbf A\mathbf B = \mathbf{0}_n\) implies \(\mathbf A=\mathbf{0}_n\) or \(\mathbf B = \mathbf{0}_n.\)
- True
- False
If \(\mathbf A,\mathbf B\in M_{n,n}(\mathbb K)\) are such that \(\mathbf{A}\mathbf{B}=\mathbf{0}_n,\) then \(\mathbf{B}\mathbf{A}=\mathbf{0}_n.\)
- True
- False
If \(\mathbf D_1,\mathbf D_2\in M_{n,n}(\mathbb{K})\) are diagonal matrices, then \(\mathbf D_1\mathbf D_2=\mathbf D_2\mathbf D_1.\)
- True
- False
If \(\mathbf{A},\mathbf{B}\) are matrices such that \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are defined, then \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are of the same size.
- True
- False
If \(\mathbf{A},\mathbf{B}\) are matrices such that \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are defined and \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are of the same size, then \(\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}.\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K})\) are such that \(\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A},\) then \(\mathbf{A}\mathbf{B}\) must be symmetric.
- True
- False
If, \(\mathbf{A}\in M_{m,r}(\mathbb{K}),\) \(\mathbf{B}\in M_{r,n}(\mathbb{K})\) and \(\mathbf{C}\in M_{r,m}(\mathbb{K}),\) then the products \(\mathbf{A}\mathbf{B},\) \(\mathbf{A}\mathbf{C},\) \(\mathbf{C}\mathbf{A}\) and \(\mathbf{C}^T\mathbf{B}\) are all defined.
- True
- False
If \(\mathbf A\in M_{2,2}(\mathbb{R})\) is such that \(\mathbf A^2=\mathbf{0}_2,\) then \(\mathbf A=\mathbf{0}_2.\)
- True
- False
Given \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) the matrix \(\mathbf{A}\mathbf{A}^T\) is always square.
- True
- False
Given \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) the matrix \(\mathbf{A}\mathbf{A}^T\) is always symmetric.
- True
- False
Given \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) then \(\mathbf{A}\mathbf{A}^T=\mathbf{A}^T\mathbf{A}.\)
- True
- False
If \(\mathbf{A},\mathbf{B}, \mathbf{C}\in M_{n,n}(\mathbb K),\) and \(\mathbf{A}\mathbf{B}=\mathbf{A}\mathbf{C},\) then \(\mathbf{B}=\mathbf{C}.\)
- True
- False
Given \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) the matrix \(\mathbf{A}+\mathbf{A}^T\) is always symmetric.
- True
- False
Given \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) the matrix \(\mathbf{A}-\mathbf{A}^T\) is always anti-symmetric.
- True
- False
There is an invertible matrix \(\mathbf{A}\) such that \(\mathbf{A}\mathbf{A}^T = \mathbf{0}_n.\)
- True
- False
There is an invertible matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) such that \(\mathbf{A}\mathbf{A}^T = \mathbf{1}_{n}.\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K})\) are invertible, then so is \(\mathbf{A}\mathbf{B}.\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K})\) are invertible, then so is \(\mathbf{A}+\mathbf{B}.\)
- True
- False
If \(\mathbf A\in M_{n,n}(\mathbb K)\) is such that \[\mathbf{1}_n + \mathbf A + \ldots + \mathbf A^m = \mathbf{0}_n\] for some \(m \geqslant 1 \in\mathbb N,\) then \(\mathbf A\) admits an inverse.
- True
- False
If \(\mathbf A,\mathbf B,\mathbf C\in M_{n,n}(\mathbb K)\) are is such that \(\mathbf A\mathbf B = \mathbf A\mathbf C = \mathbf{1}_{n},\) then \(\mathbf B = \mathbf C.\)
- True
- False
If \(\mathbf A \in M_{n,n}(\mathbb K)\) is such that \(\mathbf A^3 = \mathbf{1}_{n}\) and \(\mathbf A,\mathbf A^2,\ne\mathbf{1}_{n},\) then \(\mathbf A^{-1} = \mathbf A^{3k+1}\) for any natural number \(k.\)
- True
- False
If \(\mathbf A,\mathbf B \in M_{n,n}(\mathbb K)\) are both invertible, then \((\mathbf A\mathbf B)^{-1} = \mathbf A^{-1}\mathbf B^{-1}.\)
- True
- False
If \(\mathbf A \in M_{n,n}(\mathbb K)\) is invertible, then \(\mathbf{A}^T\) is also invertible and \((\mathbf A^T)^{-1}=(\mathbf A^{-1})^T.\)
- True
- False