2 Matrices

2.1 Definitions

A matrix (plural matrices) is simply a rectangular block of numbers. As we will see below, every matrix gives rise to a mapping sending a finite list of numbers to another finite list of numbers. Mappings arising from matrices are called linear and linear mappings are among the most fundamental objects in mathematics. In the Linear Algebra modules we develop the theory of linear maps as well as the theory of vector spaces, the natural habitat of linear maps. While this theory may come across as quite abstract, it is in fact at the heart of many real world applications, including optics and quantum physics, radio astronomy, MP3 and JPEG compression, X-ray crystallography, MRI scans and machine learning, just to name a few.

Throughout the Linear Algebra modules, \(\mathbb{K}\) stands for either the real numbers \(\mathbb{R}\) or the complex numbers \(\mathbb{C},\) but almost all statements are also valid over arbitrary fields.

We start with some definitions. In this chapter, \(m,n,{\tilde{m}},{\tilde{n}}\) denote natural numbers.

Definition 2.1 • Matrix

  • A rectangular block of scalars \(A_{ij} \in \mathbb{K},\) \(1\leqslant i\leqslant m,1\leqslant j\leqslant n\) \[\tag{2.1} \mathbf{A}=\begin{pmatrix} A_{11} & A_{12} & \cdots & A_{1n} \\ A_{21} & A_{22} & \cdots & A_{2n} \\ \vdots & & \ddots & \vdots \\ A_{m1} & A_{m2} & \cdots & A_{mn}\end{pmatrix}\] is called an \(m\times n\) matrix with entries in \(\mathbb{K}.\)

  • We also say that \(\mathbf{A}\) is an \(m\)-by-\(n\) matrix, that \(\mathbf{A}\) has size \(m\times n\) and that \(\mathbf{A}\) has \(m\) rows and \(n\) columns.

  • The entry \(A_{ij}\) of \(\mathbf{A}\) is said to have row index \(i\) where \(1\leqslant i\leqslant m,\) column index \(j\) where \(1\leqslant j\leqslant n\) and will be referred to as the \((i,j)\)-th entry of \(\mathbf{A}.\)

  • A shorthand notation for (2.1) is \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.\)

  • For matrices \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}\) and \(\mathbf{B}=(B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}\) we write \(\mathbf{A}=\mathbf{B},\) provided \(A_{ij}=B_{ij}\) for all \(1\leqslant i\leqslant m\) and all \(1\leqslant j \leqslant n.\)

Definition 2.2 • Set of matrices

  • The set of \(m\)-by-\(n\) matrices with entries in \(\mathbb{K}\) will be denoted by \(M_{m,n}(\mathbb{K}).\)

  • The elements of the set \(M_{m,1}(\mathbb{K})\) are called column vectors of length \(m\) and the elements of the set \(M_{1,n}(\mathbb{K})\) are called row vectors of length \(n\).

  • We will use the Latin alphabet for column vectors and decorate them with an arrow. For a column vector \[\vec{x}=\begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_m\end{pmatrix} \in M_{m,1}(\mathbb{K})\] we also use the shorthand notation \(\vec{x}=(x_i)_{1\leqslant i\leqslant m}\) and we write \([\vec{x}]_i\) for the \(i\)-th entry of \(\vec{x},\) so that \([\vec{x}]_i=x_i\) for all \(1\leqslant i\leqslant m.\)

  • We will use the Greek alphabet for row vectors and decorate them with an arrow. For a row vector \[\vec{\xi}=\begin{pmatrix} \xi_1 & \xi_2 & \cdots & \xi_n\end{pmatrix} \in M_{1,n}(\mathbb{K})\] we also use the shorthand notation \(\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}\) and we write \([\vec{\xi}]_i\) for the \(i\)-th entry of \(\vec{\xi},\) so that \([\vec{\xi}]_i=\xi_i\) for all \(1\leqslant i\leqslant n.\)

Remark 2.3 • Notation

  1. A matrix is always denoted by a bold capital letter, such as \(\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D}.\)

  2. The entries of the matrix are denoted by \(A_{ij},B_{ij},C_{ij},D_{ij},\) respectively.

  3. We may think of an \(m\times n\) matrix as consisting of \(n\) column vectors of length \(m.\) The column vectors of the matrix are denoted by \(\vec{a}_i,\vec{b}_i,\vec{c}_i,\vec{d}_i,\) respectively.

  4. We may think of an \(m\times n\) matrix as consisting of \(m\) row vectors of length \(n.\) The row vectors of the matrix are denoted by \(\vec{\alpha}_i,\vec{\beta}_i,\vec{\gamma}_i,\vec{\delta}_i,\) respectively.

  5. For a matrix \(\mathbf{A}\) we also write \([\mathbf{A}]_{ij}\) for the \((i,j)\)-th entry of \(\mathbf{A}.\) So for \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},\) we have \([\mathbf{A}]_{ij}=A_{ij}\) for all \(1\leqslant i\leqslant m, 1\leqslant j \leqslant n.\)

Example 2.4

For \[\mathbf{A}=\begin{pmatrix} \pi & \sqrt{2} \\ -1 & 5/3 \\ \log 2 & 3 \end{pmatrix} \in M_{3,2}(\mathbb{R}),\] we have for instance \([\mathbf{A}]_{32}=3,\) \([\mathbf{A}]_{12}=\sqrt{2},\) \([\mathbf{A}]_{21}=-1\) and \[\vec{a}_1=\begin{pmatrix} \pi \\ -1 \\ \log 2 \end{pmatrix},\quad \vec{a}_2=\begin{pmatrix} \sqrt{2} \\ 5/3\\ 3 \end{pmatrix}, \quad \vec{\alpha}_2=\begin{pmatrix} -1 & 5/3 \end{pmatrix},\quad \vec{\alpha}_3=\begin{pmatrix} \log 2 & 3 \end{pmatrix}.\]

Recall that for sets \(\mathcal{X}\) and \(\mathcal{Y}\) we write \(\mathcal{X}\times \mathcal{Y}\) for the Cartesian product of \(\mathcal{X}\) and \(\mathcal{Y},\) defined as the set of ordered pairs \((x,y)\) with \(x \in \mathcal{X}\) and \(y \in \mathcal{Y}.\) Moreover, \(\mathcal{X}\times \mathcal{X}\) is usually denoted as \(\mathcal{X}^2.\) Likewise, for a natural number \(n \in \mathbb{N},\) we write \(\mathcal{X}^n\) for the set of ordered lists consisting of \(n\) elements of \(\mathcal{X}.\) We will also refer to ordered lists consisting of \(n\) elements as \(n\)-tuples. The elements of \(\mathcal{X}^n\) are denoted by \((x_1,x_2,\ldots,x_n)\) with \(x_i \in \mathcal{X}\) for all \(1\leqslant i\leqslant n.\) In particular, for all \(n \in \mathbb{N}\) we have a bijective map from \(\mathbb{K}^n\) to \(M_{n,1}(\mathbb{K})\) given by \[\tag{2.2} (x_1,\ldots,x_n) \mapsto \begin{pmatrix} x_1 \\ \vdots \\ x_n\end{pmatrix}.\] For this reason, we also write \(\mathbb{K}^n\) for the set of column vectors of length \(n\) with entries in \(\mathbb{K}.\) The set of row vectors of length \(n\) with entries in \(\mathbb{K}\) will be denoted by \(\mathbb{K}_{n}.\)

Definition 2.5 • Special matrices and vectors

  • The zero matrix \(\mathbf{0}_{m,n}\) is the \(m\times n\) matrix whose entries are all zero. We will also write \(\mathbf{0}_n\) for the \(n\times n\)-matrix whose entries are all zero.

  • Matrices with equal number \(n\) of rows and columns are known as square matrices.

  • An entry \(A_{ij}\) of a square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is said to be a diagonal entry if \(i=j\) and an off-diagonal entry otherwise. A matrix whose off-diagonal entries are all zero is said to be diagonal.

  • We write \(\mathbf{1}_{n}\) for the diagonal \(n\times n\) matrix whose diagonal entries are all equal to \(1.\) Using the so-called Kronecker delta defined by the rule \[\delta_{ij}=\left\{\begin{array}{cc} 1 & i=j, \\ 0 & i\neq j,\end{array}\right.\] we have \([\mathbf{1}_{n}]_{ij}=\delta_{ij}\) for all \(1\leqslant i,j\leqslant n.\) The matrix \(\mathbf{1}_{n}\) is called the unit matrix or identity matrix of size \(n.\)

  • The standard basis of \(\mathbb{K}^n\) is the set \(\{\vec{e}_1,\vec{e}_2,\ldots,\vec{e}_n\}\) consisting of the column vectors of the identity matrix \(\mathbf{1}_{n}\) of size \(n.\)

  • The standard basis of \(\mathbb{K}_n\) is the set \(\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\ldots,\vec{\varepsilon}_n\}\) consisting of the row vectors of the identity matrix \(\mathbf{1}_{n}\) of size \(n.\)

Example 2.6

  1. Special matrices: \[\mathbf{0}_{2,3}=\begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, \qquad \mathbf{1}_{2}=\begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}, \quad \qquad \mathbf{1}_{3}=\begin{pmatrix} 1 & 0 &0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}.\]

  2. The standard basis of \(\mathbb{K}^3\) is \(\{\vec{e}_1,\vec{e}_2,\vec{e}_3\},\) where \[\vec{e}_1=\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \quad \vec{e}_2=\begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \quad \text{and} \quad \vec{e}_3=\begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.\]

  3. The standard basis of \(\mathbb{K}_3\) is \(\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\vec{\varepsilon}_3\},\) where \[\vec{\varepsilon}_1=\begin{pmatrix} 1 & 0 & 0 \end{pmatrix}, \quad \vec{\varepsilon}_2=\begin{pmatrix} 0 & 1 & 0 \end{pmatrix} \quad \text{and} \quad \vec{\varepsilon}_3=\begin{pmatrix} 0 & 0 & 1 \end{pmatrix}.\]

2.2 Matrix operations

We can multiply a matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) with a scalar \(s\in \mathbb{K}.\) This amounts to multiplying each entry of \(\mathbf{A}\) with \(s\):

Definition 2.7

Scalar multiplication in \(M_{m,n}(\mathbb{K})\) is the map \[\cdot_{M_{m,n}(\mathbb{K})} : \mathbb{K}\times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (s,\mathbf{A}) \mapsto s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}\] defined by the rule \[\tag{2.3} s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=(s\cdot_{\mathbb{K}} A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where \(s\cdot_{\mathbb{K}} A_{ij}\) denotes the field multiplication of scalars \(s,A_{ij} \in \mathbb{K}.\)

Remark 2.8

Here we multiply with \(s\) from the left. Likewise, we define \(\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s=(A_{ij}\cdot_{\mathbb{K}} s)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},\) that is, we multiply from the right. Of course, since multiplication of scalars is commutative, we have \(s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s,\) that is, left multiplication and right multiplication gives the same matrix. Be aware that this is not true in every number system. An example that you might encounter later on are the so-called quaternions, where multiplication fails to be commutative.

The sum of matrices \(\mathbf{A}\) and \(\mathbf{B}\) of identical size is defined as follows:

Definition 2.9

Addition in \(M_{m,n}(\mathbb{K})\) is the map \[+_{M_{m,n}(\mathbb{K})} : M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (\mathbf{A},\mathbf{B})\mapsto \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}\] defined by the rule \[\tag{2.4} \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}=(A_{ij}+_{\mathbb{K}}B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where \(A_{ij}+_{\mathbb{K}}B_{ij}\) denotes the field addition of scalars \(A_{ij},B_{ij}\in \mathbb{K}.\)

Remark 2.10 • Abusing notation

  • Field addition takes two scalars and produces another scalar, thus it is a map \(\mathbb{K}\times \mathbb{K}\to \mathbb{K},\) whereas addition of matrices is a map \(M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}).\) For this reason we wrote \(+_{M_{m,n}(\mathbb{K})}\) above in order to distinguish matrix addition from field addition of scalars. Of course, it is quite cumbersome to always write \(+_{M_{m,n}(\mathbb{K})}\) and \(+_{\mathbb{K}},\) so we follow the usual custom of writing \(+,\) both for field addition of scalars and for matrix addition, trusting that the reader is aware of the difference.

  • Likewise, we simply write \(\cdot\) instead of \(\cdot_{M_{m,n}(\mathbb{K})}\) or omit the dot entirely, so that \(s\cdot \mathbf{A}=s\mathbf{A}=s\cdot_{M_{m,n}(\mathbb{K})}\mathbf{A}\) for \(s\in \mathbb{K}\) and \(\mathbf{A}\in M_{m,n}(\mathbb{K}).\)

Example 2.11

  • Multiplication of a matrix by a scalar: \[5\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}5=\begin{pmatrix} 5\cdot 1 & 5\cdot 2 \\ 5\cdot 3 & 5\cdot 4 \end{pmatrix}=\begin{pmatrix} 5 & 10 \\ 15 & 20 \end{pmatrix}.\]

  • Addition of matrices:

    \[\begin{pmatrix} 3 & -5 \\ -2 & 8 \end{pmatrix} + \begin{pmatrix} -3 & 8 \\ 7 & 10 \end{pmatrix}=\begin{pmatrix} 0 & 3 \\ 5 & 18 \end{pmatrix}.\]

If the number of columns of a matrix \(\mathbf{A}\) is equal to the number of rows of a matrix \(\mathbf{B},\) we define the matrix product \(\mathbf{A}\mathbf{B}\) of \(\mathbf{A}\) and \(\mathbf{B}\) as follows:

Definition 2.12 • Matrix multiplication

Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) be an \(m\)-by-\(n\) matrix and \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) be an \(n\)-by-\({\tilde{m}}\) matrix. The matrix product of \(\mathbf{A}\) and \(\mathbf{B}\) is the \(m\)-by-\({\tilde{m}}\) matrix \(\mathbf{A}\mathbf{B}\in M_{m,{\tilde{m}}}(\mathbb{K})\) whose entries are defined by the rule \[[\mathbf{A}\mathbf{B}]_{ik}=A_{i1}B_{1k}+A_{i2}B_{2k}+\cdots +A_{in}B_{nk}=\sum_{j =1}^nA_{ij}B_{jk}=\sum_{j=1}^n [\mathbf{A}]_{ij}[\mathbf{B}]_{jk}.\] for all \(1\leqslant i\leqslant m\) and all \(1\leqslant k\leqslant {\tilde{m}}.\)

Remark 2.13 • Pairing of row and column vectors

We may define a pairing \(\mathbb{K}_n \times \mathbb{K}^n \to \mathbb{K}\) of a row vector of length \(n\) and a column vector of length \(n\) by the rule \[(\vec{\xi},\vec{x})\mapsto \vec{\xi}\vec{x}=\xi_1x_1+\xi_2x_2+\cdots+\xi_n x_n\] for all \(\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}\in \mathbb{K}_n\) and for all \(\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{K}^n.\) So we multiply the first entry of \(\vec{\xi}\) with the first entry of \(\vec{x},\) add the product of the second entry of \(\vec{\xi}\) and the second entry of \(\vec{x}\) and continue in this fashion until the last entry of \(\vec{\xi}\) and \(\vec{x}.\)

The \((i,j)\)-th entry of the matrix product of \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,\tilde{m}}(\mathbb{K})\) is then given by the pairing \[[\mathbf{A}\mathbf{B}]_{ij}=\vec{\alpha}_i\vec{b}_j\] of the \(i\)-th row vector \(\vec{\alpha}_i\) of \(\mathbf{A}\) and the \(j\)-th column vector \(\vec{b}_j\) of \(\mathbf{B}.\)

Remark 2.14 • Matrix multiplication is not commutative

If \(\mathbf{A}\) is a \(m\)-by-\(n\) matrix and \(\mathbf{B}\) a \(n\)-by-\(m\) matrix, then both \(\mathbf{A}\mathbf{B}\) and \(\mathbf{B}\mathbf{A}\) are defined, but in general \(\mathbf{A}\mathbf{B}\neq \mathbf{B}\mathbf{A}\) since \(\mathbf{A}\mathbf{B}\) is an \(m\)-by-\(m\) matrix and \(\mathbf{B}\mathbf{A}\) is an \(n\)-by-\(n\) matrix. Even when \(n=m\) so that both \(\mathbf{A}\) and \(\mathbf{B}\) are square matrices, it is false in general that \(\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}.\)

The matrix operations have the following properties:

Proposition 2.15 • Properties of matrix operations

  • \(\mathbf{0}_{m,n}+\mathbf{A}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);

  • \(\mathbf{1}_{m}\mathbf{A}=\mathbf{A}\) and \(\mathbf{A}\mathbf{1}_{n}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);

  • \(\mathbf{0}_{{\tilde{m}},m}\mathbf{A}=\mathbf{0}_{{\tilde{m}},n}\) and \(\mathbf{A}\mathbf{0}_{n,{\tilde{m}}}=\mathbf{0}_{m,{\tilde{m}}}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);

  • \(\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}\) and \((\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})\) for all \(\mathbf{A},\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K});\)

  • \(0\cdot \mathbf{A}=\mathbf{0}_{m,n}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);

  • \((s_1s_2)\mathbf{A}=s_1(s_2 \mathbf{A})\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(s_1,s_2 \in \mathbb{K}\);

  • \(\mathbf{A}(s\mathbf{B})=s(\mathbf{A}\mathbf{B})=(s\mathbf{A})\mathbf{B}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) and all \(s\in \mathbb{K}\);

  • \(s(\mathbf{A}+\mathbf{B})=s\mathbf{A}+s\mathbf{B}\) for all \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(s\in \mathbb{K}\);

  • \((s_1+s_2)\mathbf{A}=s_1\mathbf{A}+s_2\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and for all \(s_1,s_2 \in \mathbb{K}\);

  • \((\mathbf{B}+\mathbf{C})\mathbf{A}=\mathbf{B}\mathbf{A}+\mathbf{C}\mathbf{A}\) for all \(\mathbf{B},\mathbf{C}\in M_{{\tilde{m}},m}(\mathbb{K})\) and for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);

  • \(\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}\) for all \(\mathbf{A}\in M_{{\tilde{m}},m}(\mathbb{K})\) and for all \(\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K}).\)

Proof. We only show the second and the last property. The proofs of the remaining ones are similar and/or elementary consequences of the properties of addition and multiplication of scalars.

To show the second property consider \(\mathbf{A}\in M_{m,n}(\mathbb{K}).\) Then, by definition, we have for all \(1\leqslant k\leqslant m\) and all \(1\leqslant j\leqslant n\) \[[\mathbf{1}_{m}\mathbf{A}]_{kj}=\sum_{i=1}^m[\mathbf{1}_{m}]_{ki}[\mathbf{A}]_{ij}=\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}=[\mathbf{A}]_{kj},\] where the second last equality uses that \(\delta_{ki}\) is \(0\) unless \(i=k,\) in which case \(\delta_{kk}=1.\) We conclude that \(\mathbf{1}_{m}\mathbf{A}=\mathbf{A}.\) Likewise, we obtain for all \(1\leqslant i\leqslant m\) and all \(1\leqslant k\leqslant n\) \[[\mathbf{A}\mathbf{1}_{n}]_{ik}=\sum_{j=1}^n[\mathbf{A}]_{ij}[\mathbf{1}_{n}]_{jk}=\sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}=[\mathbf{A}]_{ik}\] so that \(\mathbf{A}\mathbf{1}_{n}=\mathbf{A}.\) The identities \[\boxed{\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}\qquad \text{and}\qquad \sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}}\] are used repeatedly in Linear Algebra, so make sure you understand them.

For the last property, applying the definition of matrix multiplication gives \[\mathbf{A}\mathbf{B}=\left(\sum_{i=1}^m A_{ki}B_{ij}\right)_{1\leqslant k\leqslant {\tilde{m}}, 1\leqslant j \leqslant n}\quad \text{and}\quad \mathbf{A}\mathbf{C}=\left(\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant {\tilde{m}}, 1\leqslant j \leqslant n},\] so that \[\begin{aligned} \mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}&=\left(\sum_{i=1}^m A_{ki}B_{i j}+\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant {\tilde{m}}, 1\leqslant j \leqslant n}\\&=\left(\sum_{i=1}^m A_{ki}\left(B_{i j}+C_{i j}\right)\right)_{1\leqslant k\leqslant {\tilde{m}}, 1\leqslant j \leqslant n}=\mathbf{A}(\mathbf{B}+\mathbf{C}), \end{aligned}\] where we use that \[\mathbf{B}+\mathbf{C}=\left(B_{ij}+C_{ij}\right)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.\]

Finally, we may flip a matrix along its “diagonal entries”, that is, we interchange the role of rows and columns. More precisely:

Definition 2.16 • Transpose of a matrix

  • The transpose of a matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) is the matrix \(\mathbf{A}^{T} \in M_{n,m}(\mathbb{K})\) satisfying \[\left[\mathbf{A}^T\right]_{ij}=[\mathbf{A}]_{ji}\] for all \(1\leqslant i\leqslant n\) and \(1\leqslant j\leqslant m.\)

  • A square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) that satisfies \(\mathbf{A}=\mathbf{A}^T\) is called symmetric.

  • A square matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) that satisfies \(\mathbf{A}=-\mathbf{A}^T\) is called anti-symmetric.

Example 2.17

If \[\mathbf{A}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6\end{pmatrix}, \quad \text{then} \quad \mathbf{A}^T=\begin{pmatrix} 1 & 3 & 5 \\ 2 & 4 & 6\end{pmatrix}.\]

Remark 2.18 • Properties of the transpose

  1. For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) we have by definition \((\mathbf{A}^T)^T=\mathbf{A}.\)

  2. For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K}),\) we have \[(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.\] Indeed, by definition we have for all \(1\leqslant i\leqslant {\tilde{m}}\) and all \(1\leqslant j\leqslant m\) \[\left[(\mathbf{A}\mathbf{B})^T\right]_{ij}=[\mathbf{A}\mathbf{B}]_{ji}=\sum_{k=1}^n [\mathbf{A}]_{jk}[\mathbf{B}]_{ki}=\sum_{k=1}^n\left[\mathbf{B}^T\right]_{ik}\left[\mathbf{A}^T\right]_{kj}=\left[\mathbf{B}^T\mathbf{A}^T\right]_{ij}.\]

2.3 Mappings associated to matrices

Definition 2.19 • Mapping associated to a matrix

For an \((m\times n)\)-matrix \(\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K})\) with column vectors \(\vec{a}_1,\ldots,\vec{a}_n \in \mathbb{K}^m\) we define a mapping \[f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m, \qquad \vec{x} \mapsto \mathbf{A}\vec{x},\] where the column vector \(\mathbf{A}\vec{x} \in \mathbb{K}^m\) is obtained by matrix multiplication of the matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and the column vector \(\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{K}^n\) \[\mathbf{A}\vec{x}=\vec{a}_1x_1+\vec{a}_2x_2+\cdots +\vec{a}_n x_n=\begin{pmatrix} A_{11}x_1+A_{12}x_2+\cdots +A_{1n}x_n\\ A_{21}x_1+A_{22}x_2+\cdots +A_{2n}x_n\\ \vdots \\ A_{m1}x_1+A_{m2}x_2+\cdots +A_{mn}x_n\end{pmatrix}.\]

Recall that if \(f : \mathcal{X}\to \mathcal{Y}\) and \(g : \mathcal{X} \to \mathcal{Y}\) are mappings from a set \(\mathcal{X}\) into a set \(\mathcal{Y},\) then we write \(f=g\) if \(f(x)=g(x)\) for all elements \(x \in \mathcal{X}.\)

The matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) uniquely determines the mapping \(f_\mathbf{A}\):

Proposition 2.20

Let \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}).\) Then \(f_\mathbf{A}=f_\mathbf{B}\) if and only if \(\mathbf{A}=\mathbf{B}.\)

Proof. If \(\mathbf{A}=\mathbf{B},\) then \(A_{ij}=B_{ij}\) for all \(1\leqslant i\leqslant m, 1\leqslant j \leqslant n,\) hence we conclude that \(f_\mathbf{A}=f_\mathbf{B}.\) In order to show the converse direction we consider the standard basis \(\vec{e}_i=(\delta_{ij})_{1\leqslant j\leqslant n},\) \(i=1,\ldots,n\) of \(\mathbb{K}^n.\) Now by assumption \[f_\mathbf{A}(\vec{e}_i)=\begin{pmatrix}A_{1i} \\ A_{2i} \\ \vdots \\ A_{mi}\end{pmatrix}=f_\mathbf{B}(\vec{e}_i)=\begin{pmatrix}B_{1i} \\ B_{2i} \\ \vdots \\ B_{mi}\end{pmatrix}.\] Since this holds for all \(i=1,\ldots,n,\) we conclude \(A_{ij}=B_{ij}\) for all \(j=1,\ldots, m\) and \(i=1,\ldots, n.\) Therefore, we have \(\mathbf{A}=\mathbf{B},\) as claimed.

Recall that if \(f : \mathcal{X}\to \mathcal{Y}\) is a mapping from a set \(\mathcal{X}\) into a set \(\mathcal{Y}\) and \(g : \mathcal{Y} \to \mathcal{Z}\) a mapping from \(\mathcal{Y}\) into a set \(\mathcal{Z},\) we can consider the composition of \(g\) and \(f\) \[g\circ f : \mathcal{X} \to \mathcal{Z}, \qquad x \mapsto g(f(x)).\]

The motivation for the Definition 2.12 of matrix multiplication is given by the following theorem which states that the mapping \(f_{\mathbf{A}\mathbf{B}}\) associated to the matrix product \(\mathbf{A}\mathbf{B}\) is the composition of the mapping \(f_\mathbf{A}\) associated to the matrix \(\mathbf{A}\) and the mapping \(f_\mathbf{B}\) associated to the matrix \(\mathbf{B}.\) More precisely:

Theorem 2.21

Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) so that \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) and \(f_\mathbf{B}: \mathbb{K}^{{\tilde{m}}} \to \mathbb{K}^n\) and \(f_{\mathbf{A}\mathbf{B}} : \mathbb{K}^{{\tilde{m}}} \to \mathbb{K}^{m}.\) Then \(f_{\mathbf{A}\mathbf{B}}=f_\mathbf{A}\circ f_\mathbf{B}.\)

Proof. For \(\vec{x}=(x_k)_{1\leqslant k\leqslant {\tilde{m}}} \in \mathbb{K}^{{\tilde{m}}}\) we write \(\vec{y}=f_\mathbf{B}(\vec{x}).\) Then, by definition, \(\vec{y}=\mathbf{B}\vec{x}=(y_j)_{1\leqslant j\leqslant n}\) where \[\tag{2.5} y_j=B_{j1}x_1+B_{j2}x_2+\cdots+B_{j{\tilde{m}}}x_{{\tilde{m}}}=\sum_{k=1}^{{\tilde{m}}} B_{jk}x_k.\] Hence writing \(\vec{z}=f_\mathbf{A}(\vec{y})=\mathbf{A}\vec{y},\) we have \(\vec{z}=(z_i)_{1\leqslant i\leqslant m},\) where \[\begin{aligned} z_i&=A_{i1}y_1+A_{i2}y_2+\cdots +A_{in}y_n=\sum_{j=1}^n A_{ij}y_j=\sum_{j=1}^nA_{ij}\sum_{k=1}^{{\tilde{m}}}B_{jk}x_k\\ &=\sum_{k=1}^{{\tilde{m}}}\left(\sum_{j=1}^nA_{ij}B_{jk}\right)x_k \end{aligned}\] and where have used (2.5). Since \(\mathbf{A}\mathbf{B}=(C_{ik})_{1\leqslant i\leqslant m, 1\leqslant k \leqslant {\tilde{m}}}\) with \[C_{ik}=\sum_{j=1}^n A_{ij}B_{jk},\] we conclude that \(\vec{z}=f_{\mathbf{A}\mathbf{B}}(\vec{x}),\) as claimed.

Combining Theorem 2.21 and Proposition 2.20, we also obtain:

Corollary 2.22

Let \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) and \(\mathbf{C}\in M_{{\tilde{m}},{\tilde{n}}}(\mathbb{K}).\) Then \[(\mathbf{A}\mathbf{B})\mathbf{C}=\mathbf{A}(\mathbf{B}\mathbf{C}),\] that is, the matrix product is associative.

Proof. Using Proposition 2.20 it is enough to show that \[f_{\mathbf{A}\mathbf{B}}\circ f_\mathbf{C}=f_\mathbf{A}\circ f_{\mathbf{B}\mathbf{C}}.\] Using Theorem 2.21, we get for all \(\vec{x} \in \mathbb{K}^{{\tilde{n}}}\) \[\left(f_{\mathbf{A}\mathbf{B}}\circ f_\mathbf{C}\right)(\vec{x})=f_{\mathbf{A}\mathbf{B}}(f_\mathbf{C}(\vec{x}))=f_\mathbf{A}(f_\mathbf{B}(f_\mathbf{C}(\vec{x})))=f_\mathbf{A}(f_{\mathbf{B}\mathbf{C}}(\vec{x}))=\left(f_\mathbf{A}\circ f_{\mathbf{B}\mathbf{C}}\right)(\vec{x}).\]

Remark 2.23

For all \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) satisfies the following two very important properties \[\tag{2.6} \begin{aligned} f_\mathbf{A}(\vec{x}+\vec{y})&=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y}),\qquad &&(\text{additivity}),\\ f_\mathbf{A}(s\cdot \vec{x})&=s\cdot f_\mathbf{A}(\vec{x}),\qquad &&(\text{$1$-homogeneity}), \end{aligned}\] for all \(\vec{x},\vec{y} \in \mathbb{K}^{n}\) and \(s\in \mathbb{K}.\) Indeed, using Proposition 2.15 we have \[f_\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}\vec{x}+\mathbf{A}\vec{y}=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y})\] and \[f_\mathbf{A}(s\cdot \vec{x})=\mathbf{A}(s\vec{x})=s\cdot (\mathbf{A}\vec{x})=s\cdot f_\mathbf{A}(\vec{x}).\] Mappings satisfying (2.6) are called linear.

Example 2.24

Notice that “most” functions \(\mathbb{R}\to \mathbb{R}\) are neither additive nor \(1\)-homogeneous. As an example, consider a mapping \(f : \mathbb{R}\to \mathbb{R}\) which satisfies the \(1\)-homogeneity property. Let \(a=f(1) \in \mathbb{R}.\) Then the \(1\)-homogeneity implies that for all \(x \in \mathbb{R}=\mathbb{R}^1\) we have \[f(x)=f(x\cdot 1)=x\cdot f(1)=a \cdot x,\] showing that the only \(1\)-homogeneous mappings from \(\mathbb{R}\to \mathbb{R}\) are of the form \(x \mapsto a x,\) where \(a\) is a real number. In particular, \(\sin,\cos,\tan,\log,\exp,\sqrt{\phantom{x}}\) and all polynomials of degree higher than one are not linear.

Home

Contents

Exercises

Lecture Recordings

Quizzes

Study Weeks