2 Matrices, I

In this chapter we’ll recall some things you learned in “Algorithmics” about matrices and vectors, and learn some new properties.

Throughout the rest of this module, $\mathbb{K}$ stands for an arbitrary field. It won’t ever matter which field it is; so you can assume $\mathbb{K}= \mathbb{R}$ (or $\mathbb{C}$) throughout if you prefer.

2.1 Definitions

We start with some definitions. In this chapter, $m, n, p, q, r$ denote natural numbers.

A matrix (plural matrices) is simply a rectangular block of numbers. More precisely:

Definition 2.1 • Matrix

A rectangular block of scalars $A_{ij} \in \mathbb{K},$ $1\leqslant i\leqslant m,1\leqslant j\leqslant n$ \[\tag{2.1} \mathbf{A}=\begin{pmatrix} A_{11} & A_{12} & \cdots & A_{1n} \\ A_{21} & A_{22} & \cdots & A_{2n} \\ \vdots & & \ddots & \vdots \\ A_{m1} & A_{m2} & \cdots & A_{mn}\end{pmatrix}\] is called an $m\times n$ matrix with entries in $\mathbb{K}.$
We also say that $\mathbf{A}$ is an $m$-by-$n$ matrix, that $\mathbf{A}$ has size $m\times n$ and that $\mathbf{A}$ has $m$ rows and $n$ columns.
The entry $A_{ij}$ of $\mathbf{A}$ is said to have row index $i$ where $1\leqslant i\leqslant m,$ column index $j$ where $1\leqslant j\leqslant n$ and will be referred to as the $(i,j)$-th entry of $\mathbf{A}.$
A shorthand notation for (2.1) is $\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.$
For matrices $\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}$ and $\mathbf{B}=(B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}$ we write $\mathbf{A}=\mathbf{B},$ provided $A_{ij}=B_{ij}$ for all $1\leqslant i\leqslant m$ and all $1\leqslant j \leqslant n.$

Definition 2.2 • Set of matrices

The set of $m$-by-$n$ matrices with entries in $\mathbb{K}$ will be denoted by $M_{m,n}(\mathbb{K}).$
The elements of the set $M_{m,1}(\mathbb{K})$ are called column vectors of length $m$ and the elements of the set $M_{1,n}(\mathbb{K})$ are called row vectors of length $n$.
We will use the Latin alphabet for column vectors and decorate them with an arrow. For a column vector \[\vec{x}=\begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_m\end{pmatrix} \in M_{m,1}(\mathbb{K})\] we also use the shorthand notation $\vec{x}=(x_i)_{1\leqslant i\leqslant m}$ and we write $[\vec{x}]_i$ for the $i$-th entry of $\vec{x},$ so that $[\vec{x}]_i=x_i$ for all $1\leqslant i\leqslant m.$
We will use the Greek alphabet for row vectors and decorate them with an arrow. For a row vector \[\vec{\xi}=\begin{pmatrix} \xi_1 & \xi_2 & \cdots & \xi_n\end{pmatrix} \in M_{1,n}(\mathbb{K})\] we also use the shorthand notation $\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}$ and we write $[\vec{\xi}]_i$ for the $i$-th entry of $\vec{\xi},$ so that $[\vec{\xi}]_i=\xi_i$ for all $1\leqslant i\leqslant n.$

Remark 2.3 • Notation

A matrix is always denoted by a bold capital letter, such as $\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D}.$
The entries of the matrix are denoted by $A_{ij},B_{ij},C_{ij},D_{ij},$ respectively.
We may think of an $m\times n$ matrix as consisting of $n$ column vectors of length $m.$ The column vectors of the matrix are denoted by $\vec{a}_i,\vec{b}_i,\vec{c}_i,\vec{d}_i,$ respectively.
We may think of an $m\times n$ matrix as consisting of $m$ row vectors of length $n.$ The row vectors of the matrix are denoted by $\vec{\alpha}_i,\vec{\beta}_i,\vec{\gamma}_i,\vec{\delta}_i,$ respectively.
For a matrix $\mathbf{A}$ we also write $[\mathbf{A}]_{ij}$ for the $(i,j)$-th entry of $\mathbf{A}.$ So for $\mathbf{A}=(A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},$ we have $[\mathbf{A}]_{ij}=A_{ij}$ for all $1\leqslant i\leqslant m, 1\leqslant j \leqslant n.$

Example 2.4

For \[\mathbf{A}=\begin{pmatrix} \pi & \sqrt{2} \\ -1 & 5/3 \\ \log 2 & 3 \end{pmatrix} \in M_{3,2}(\mathbb{R}),\] we have for instance $[\mathbf{A}]_{32}=3,$ $[\mathbf{A}]_{12}=\sqrt{2},$ $[\mathbf{A}]_{21}=-1$ and \[\vec{a}_1=\begin{pmatrix} \pi \\ -1 \\ \log 2 \end{pmatrix},\quad \vec{a}_2=\begin{pmatrix} \sqrt{2} \\ 5/3\\ 3 \end{pmatrix}, \quad \vec{\alpha}_2=\begin{pmatrix} -1 & 5/3 \end{pmatrix},\quad \vec{\alpha}_3=\begin{pmatrix} \log 2 && 3 \end{pmatrix}.\]

We’ll use the shorthand notation $\mathbb{K}^n$ for column vectors with $n$ entries (i.e. $M_{n, 1}(\mathbb{K})$), and $\mathbb{K}_n$ for row vectors $M_{1, n}(\mathbb{K}).$ We can of course go back and forth between them, since there’s a bijective map from $\mathbb{K}_n$ to $\mathbb{K}^n$ sending $(x_1\ \dots\ x_n)$ to $\begin{pmatrix} x_1 \\ \vdots\\ x_n\end{pmatrix}$; but it’s helpful to think of them as separate, since they will interact differently with matrix multiplication.

Definition 2.5 • Special matrices and vectors

The zero matrix $\mathbf{0}_{m,n}$ is the $m\times n$ matrix whose entries are all zero. We will also write $\mathbf{0}_n$ for the $n\times n$-matrix whose entries are all zero.
Matrices with equal number $n$ of rows and columns are known as square matrices.
An entry $A_{ij}$ of a square matrix $\mathbf{A}\in M_{n,n}(\mathbb{K})$ is said to be a diagonal entry if $i=j$ and an off-diagonal entry otherwise. A matrix whose off-diagonal entries are all zero is said to be diagonal. (The notion of “diagonal” sort of makes sense for non-square matrices too, but it’s most useful in the diagonal case.)
We write $\mathbf{1}_{n}$ for the diagonal $n\times n$ matrix whose diagonal entries are all equal to $1.$ Using the so-called Kronecker delta defined by the rule \[\delta_{ij}=\left\{\begin{array}{cc} 1 & i=j, \\ 0 & i\neq j,\end{array}\right.\] we have $[\mathbf{1}_{n}]_{ij}=\delta_{ij}$ for all $1\leqslant i,j\leqslant n.$ The matrix $\mathbf{1}_{n}$ is called the unit matrix or identity matrix of size $n.$
The standard basis of $\mathbb{K}^n$ is the set $\{\vec{e}_1,\vec{e}_2,\ldots,\vec{e}_n\}$ consisting of the column vectors of the identity matrix $\mathbf{1}_{n}$ of size $n.$
The standard basis of $\mathbb{K}_n$ is the set $\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\ldots,\vec{\varepsilon}_n\}$ consisting of the row vectors of the identity matrix $\mathbf{1}_{n}$ of size $n.$

Example 2.6

Special matrices: \[\mathbf{0}_{2,3}=\begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}, \qquad \mathbf{1}_{2}=\begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}, \quad \qquad \mathbf{1}_{3}=\begin{pmatrix} 1 & 0 &0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}.\]
The standard basis of $\mathbb{K}^3$ is $\{\vec{e}_1,\vec{e}_2,\vec{e}_3\},$ where \[\vec{e}_1=\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \quad \vec{e}_2=\begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \quad \text{and} \quad \vec{e}_3=\begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.\]
The standard basis of $\mathbb{K}_3$ is $\{\vec{\varepsilon}_1,\vec{\varepsilon}_2,\vec{\varepsilon}_3\},$ where \[\vec{\varepsilon}_1=\begin{pmatrix} 1 & 0 & 0 \end{pmatrix}, \quad \vec{\varepsilon}_2=\begin{pmatrix} 0 & 1 & 0 \end{pmatrix} \quad \text{and} \quad \vec{\varepsilon}_3=\begin{pmatrix} 0 & 0 & 1 \end{pmatrix}.\]

2.2 Arithmetic with matrices

Matrix addition

The sum of matrices $\mathbf{A}$ and $\mathbf{B}$ of identical size is defined as follows:

Definition 2.7

Addition in $M_{m,n}(\mathbb{K})$ is the map \[+_{M_{m,n}(\mathbb{K})} : M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (\mathbf{A},\mathbf{B})\mapsto \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}\] defined by the rule \[\tag{2.2} \mathbf{A}+_{M_{m,n}(\mathbb{K})}\mathbf{B}=(A_{ij}+_{\mathbb{K}}B_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where $A_{ij}+_{\mathbb{K}}B_{ij}$ denotes the field addition of scalars $A_{ij},B_{ij}\in \mathbb{K}.$

Remark 2.8 • Abusing notation

Field addition takes two scalars and produces another scalar, thus it is a map $\mathbb{K}\times \mathbb{K}\to \mathbb{K},$ whereas addition of matrices is a map $M_{m,n}(\mathbb{K}) \times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}).$ For this reason we wrote $+_{M_{m,n}(\mathbb{K})}$ above in order to distinguish matrix addition from field addition of scalars. Of course, it is quite cumbersome to always write $+_{M_{m,n}(\mathbb{K})}$ and $+_{\mathbb{K}},$ so we follow the usual custom of writing “$+$” both for field addition of scalars and for matrix addition, trusting that the reader is aware of the difference.

(Similarly, the notations $\vec{\varepsilon}_1$ etc for standard basis vectors are slightly ambiguous since we haven’t specified $n,$ but $(1\quad 0)$ and $(1 \quad 0 \quad 0)$ are not literally the same vector. Whenever we use these notations it will always be clear from context what $n$ is.)

Scalar multiplication of a matrix

We can multiply a matrix $\mathbf{A}\in M_{m,n}(\mathbb{K})$ with a scalar $s \in \mathbb{K}.$ This amounts to multiplying each entry of $\mathbf{A}$ with $s$:

Definition 2.9

Scalar multiplication in $M_{m,n}(\mathbb{K})$ is the map \[\cdot_{M_{m,n}(\mathbb{K})} : \mathbb{K}\times M_{m,n}(\mathbb{K}) \to M_{m,n}(\mathbb{K}), \qquad (s,\mathbf{A}) \mapsto s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}\] defined by the rule \[\tag{2.3} s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=(s\cdot_{\mathbb{K}} A_{ij})_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n} \in M_{m,n}(\mathbb{K}),\] where $s\cdot_{\mathbb{K}} A_{ij}$ denotes the field multiplication of scalars $s,A_{ij} \in \mathbb{K}.$

Remark 2.10

Here we multiply with $s$ from the left. Likewise, we define $\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s=(A_{ij}\cdot_{\mathbb{K}} s)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n},$ that is, we multiply from the right. Of course, since multiplication of scalars is commutative, we have $s\cdot_{M_{m,n}(\mathbb{K})} \mathbf{A}=\mathbf{A}\cdot_{M_{m,n}(\mathbb{K})}s,$ that is, left multiplication and right multiplication gives the same matrix. However, in a moment we’ll encounter a more general kind of multiplication where this isn’t true.

Example 2.11

Multiplication of a matrix by a scalar: \[5\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}5=\begin{pmatrix} 5\cdot 1 & 5\cdot 2 \\ 5\cdot 3 & 5\cdot 4 \end{pmatrix}=\begin{pmatrix} 5 & 10 \\ 15 & 20 \end{pmatrix}.\]
Addition of matrices:

\[\begin{pmatrix} 3 & -5 \\ -2 & 8 \end{pmatrix} + \begin{pmatrix} -3 & 8 \\ 7 & 10 \end{pmatrix}=\begin{pmatrix} 0 & 3 \\ 5 & 18 \end{pmatrix}.\]

In particular, since row vectors and column vectors are just special types of matrix, we can multiply those by scalars; and we have the following easy but useful lemma:

Lemma 2.12

For any row vector $\vec{\xi} = (\xi_1\quad \xi_2\quad\dots\quad \xi_n) \in \mathbb{K}_n$ we have $\vec{\xi} = \sum_{j = 1}^n\xi_j \vec{\varepsilon}_j,$ and similarly for column vectors.

Proof. Clear.

Matrix multiplication

If the number of columns of a matrix $\mathbf{A}$ is equal to the number of rows of a matrix $\mathbf{B},$ we define the matrix product $\mathbf{A}\mathbf{B}$ of $\mathbf{A}$ and $\mathbf{B}$ as follows:

Definition 2.13 • Matrix multiplication

Let $\mathbf{A}\in M_{m,n}(\mathbb{K})$ be an $m$-by-$n$ matrix and $\mathbf{B}\in M_{n,r}(\mathbb{K})$ be an $n$-by-$r$ matrix. The matrix product of $\mathbf{A}$ and $\mathbf{B}$ is the $m$-by-$r$ matrix $\mathbf{A}\mathbf{B}\in M_{m,r}(\mathbb{K})$ whose entries are defined by the rule \[[\mathbf{A}\mathbf{B}]_{ik}=A_{i1}B_{1k}+A_{i2}B_{2k}+\cdots +A_{in}B_{nk}=\sum_{j =1}^nA_{ij}B_{jk}=\sum_{j=1}^n [\mathbf{A}]_{ij}[\mathbf{B}]_{jk}.\] for all $1\leqslant i\leqslant m$ and all $1\leqslant k\leqslant r.$

This definition might seem a little weird and arbitrary at first sight, but will turn out to give us a nice theory. (Perhaps the best motivation for it will come much later in the module, in Theorem 7.6.)

Remark 2.14 • Matrix multiplication is not commutative

If $\mathbf{A}$ is a $m$-by-$n$ matrix and $\mathbf{B}$ a $n$-by-$m$ matrix, then both $\mathbf{A}\mathbf{B}$ and $\mathbf{B}\mathbf{A}$ are defined, but in general $\mathbf{A}\mathbf{B}\neq \mathbf{B}\mathbf{A}.$ In general $\mathbf{A}\mathbf{B}$ and $\mathbf{B}\mathbf{A}$ aren’t even the same size, since $\mathbf{A}\mathbf{B}$ is an $m$-by-$m$ matrix and $\mathbf{B}\mathbf{A}$ is an $n$-by-$n$ matrix. Even when $n=m,$ so that $\mathbf{A}\mathbf{B}$ and $\mathbf{B}\mathbf{A}$ are the same size, it is still false in general that $\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}.$

Remark 2.15 • Pairing of row and column vectors

We may define a pairing $\mathbb{K}_n \times \mathbb{K}^n \to \mathbb{K}$ of a row vector of length $n$ and a column vector of length $n$ by the rule \[(\vec{\xi},\vec{x})\mapsto \vec{\xi}\cdot \vec{x}=\xi_1x_1+\xi_2x_2+\cdots+\xi_n x_n\] for all $\vec{\xi}=(\xi_i)_{1\leqslant i\leqslant n}\in \mathbb{K}_n$ and for all $\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{K}^n.$ So we multiply the first entry of $\vec{\xi}$ with the first entry of $\vec{x},$ add the product of the second entry of $\vec{\xi}$ and the second entry of $\vec{x}$ and continue in this fashion until the last entry of $\vec{\xi}$ and $\vec{x}.$

The $(i,j)$-th entry of the matrix product of $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and $\mathbf{B}\in M_{n,r}(\mathbb{K})$ is then given by the pairing \[[\mathbf{A}\mathbf{B}]_{ij}=\vec{\alpha}_i\vec{b}_j\] of the $i$-th row vector $\vec{\alpha}_i$ of $\mathbf{A}$ and the $j$-th column vector $\vec{b}_j$ of $\mathbf{B}.$

Properties

Here is a long list of (mostly quite straightforward) properties of these matrix operations. Here $\mathbb{K}$ is any field, and $m, n, r, s$ are natural numbers.

Proposition 2.16 • Properties of matrix operations

$\mathbf{0}_{m,n}+\mathbf{A}=\mathbf{A}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$;
$\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}$ and $(\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})$ for all $\mathbf{A},\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K});$
$\mathbf{1}_{m}\mathbf{A}=\mathbf{A}$ and $\mathbf{A}\mathbf{1}_{n}=\mathbf{A}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$;
$\mathbf{0}_{{r},m}\mathbf{A}=\mathbf{0}_{{r},n}$ and $\mathbf{A}\mathbf{0}_{n,{r}}=\mathbf{0}_{m,{r}}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$;
(“Associativity of multiplication”) $(\mathbf{A}\mathbf{B}) \mathbf{C}= \mathbf{A}(\mathbf{B}\mathbf{C})$ for all $\mathbf{A}\in M_{m, p}(\mathbb{K}),$ $\mathbf{B}\in M_{p, q}(\mathbb{K})$ and $\mathbf{C}\in M_{q, n}(\mathbb{K})$;
$(\mathbf{B}+\mathbf{C})\mathbf{A}=\mathbf{B}\mathbf{A}+\mathbf{C}\mathbf{A}$ for all $\mathbf{B},\mathbf{C}\in M_{{r},m}(\mathbb{K})$ and for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$;
$\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}$ for all $\mathbf{A}\in M_{{r},m}(\mathbb{K})$ and for all $\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K}).$
$0_{\mathbb{K}} \cdot \mathbf{A}=\mathbf{0}_{m,n}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$;
$(s_1s_2)\mathbf{A}=s_1(s_2 \mathbf{A})$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and all $s_1,s_2 \in \mathbb{K}$;
$\mathbf{A}(s \mathbf{B})=s(\mathbf{A}\mathbf{B})=(s \mathbf{A})\mathbf{B}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and all $\mathbf{B}\in M_{n,{r}}(\mathbb{K})$ and all $s \in \mathbb{K}$;
$s(\mathbf{A}+\mathbf{B})=s \mathbf{A}+s \mathbf{B}$ for all $\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})$ and $s \in \mathbb{K}$;
$(s_1+s_2)\mathbf{A}=s_1\mathbf{A}+s_2\mathbf{A}$ for all $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and for all $s_1,s_2 \in \mathbb{K}.$

Remark 2.17

Some of these only involve one of our three matrix operations, and some involve several. (Exercise: make a table showing which properties involve which operations!)

Because of (ii) and (v), we don’t need to write brackets when we deal with sums (or products) of three or more matrices – we can just write $\mathbf{A}+ \mathbf{B}+ \mathbf{C}$ or $\mathbf{A}\mathbf{B}\mathbf{C},$ assuming the matrices are of compatible sizes so the operations make sense (and similarly for four or more matrices).

Proof. As a sample, we show properties (iii) and (vii), which are quite easy, and (v), which is slightly harder. The proofs of the remaining ones are similar and/or elementary consequences of the properties of addition and multiplication of scalars.

To show property (iii), consider $\mathbf{A}\in M_{m,n}(\mathbb{K}).$ Then, by definition, we have for all $1\leqslant k\leqslant m$ and all $1\leqslant j\leqslant n$ \[[\mathbf{1}_{m}\mathbf{A}]_{kj}=\sum_{i=1}^m[\mathbf{1}_{m}]_{ki}[\mathbf{A}]_{ij}=\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}=[\mathbf{A}]_{kj},\] where the second last equality uses that $\delta_{ki}$ is $0$ unless $i=k,$ in which case $\delta_{kk}=1.$ We conclude that $\mathbf{1}_{m}\mathbf{A}=\mathbf{A}.$ Likewise, we obtain for all $1\leqslant i\leqslant m$ and all $1\leqslant k\leqslant n$ \[[\mathbf{A}\mathbf{1}_{n}]_{ik}=\sum_{j=1}^n[\mathbf{A}]_{ij}[\mathbf{1}_{n}]_{jk}=\sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}=[\mathbf{A}]_{ik}\] so that $\mathbf{A}\mathbf{1}_{n}=\mathbf{A}.$ The identities \[\boxed{\sum_{i=1}^m\delta_{ki}A_{ij}=A_{kj}\qquad \text{and}\qquad \sum_{j=1}^nA_{ij}\delta_{jk}=A_{ik}}\] are used repeatedly in Linear Algebra, so make sure you understand them.

For property (vii), applying the definition of matrix multiplication gives \[\mathbf{A}\mathbf{B}=\left(\sum_{i=1}^m A_{ki}B_{ij}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}\quad \text{and}\quad \mathbf{A}\mathbf{C}=\left(\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n},\] so that \[\begin{aligned} \mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}&=\left(\sum_{i=1}^m A_{ki}B_{i j}+\sum_{i=1}^m A_{ki}C_{i j}\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}\\&=\left(\sum_{i=1}^m A_{ki}\left(B_{i j}+C_{i j}\right)\right)_{1\leqslant k\leqslant r, 1\leqslant j \leqslant n}=\mathbf{A}(\mathbf{B}+\mathbf{C}), \end{aligned}\] where we use that \[\mathbf{B}+\mathbf{C}=\left(B_{ij}+C_{ij}\right)_{1\leqslant i\leqslant m, 1\leqslant j \leqslant n}.\]

For property (v), if $A_{ij} = [\mathbf{A}]_{ij}$ etc, we have \[[ (\mathbf{A}\mathbf{B}) \mathbf{C}]_{ij} = \sum_{b = 1}^q [(\mathbf{A}\mathbf{B})]_{ib} [\mathbf{C}]_{bj} = \sum_{b = 1}^q \left(\sum_{a = 1}^p A_{ia} B_{ab}\right) C_{b j}\] This is the sum over all possible products $A_{ia} B_{ab} C_{bj}$ (with $i, j$ fixed and $a, b$ varying); we can group together these terms in whichever order we like, so we take the sum over $a$ to the outside: \[\dots = \sum_{a = 1}^pA_{ia} \left(\sum_{b = 1}^q B_{ab} C_{b j}\right)= \sum_{a = 1}^p[\mathbf{A}]_{ia} [\mathbf{B}\mathbf{C}]_{a j} = [ \mathbf{A}(\mathbf{B}\mathbf{C})]_{i j}\] as required.

2.3 Transpose and inverse

Transpose

Finally, we may flip a matrix along its “diagonal entries”, that is, we interchange the role of rows and columns. More precisely:

Definition 2.18 • Transpose of a matrix

The transpose of a matrix $\mathbf{A}\in M_{m,n}(\mathbb{K})$ is the matrix $\mathbf{A}^{T} \in M_{n,m}(\mathbb{K})$ satisfying \[\left[\mathbf{A}^T\right]_{ij}=[\mathbf{A}]_{ji}\] for all $1\leqslant i\leqslant n$ and $1\leqslant j\leqslant m.$
A square matrix $\mathbf{A}\in M_{n,n}(\mathbb{K})$ that satisfies $\mathbf{A}=\mathbf{A}^T$ is called symmetric.
A square matrix $\mathbf{A}\in M_{n,n}(\mathbb{K})$ that satisfies $\mathbf{A}=-\mathbf{A}^T$ is called anti-symmetric.

Example 2.19

If \[\mathbf{A}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6\end{pmatrix}, \quad \text{then} \quad \mathbf{A}^T=\begin{pmatrix} 1 & 3 & 5 \\ 2 & 4 & 6\end{pmatrix}.\]

Remark 2.20 • Properties of the transpose

For $\mathbf{A}\in M_{m,n}(\mathbb{K})$ we have by definition $(\mathbf{A}^T)^T=\mathbf{A}.$
For $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and $\mathbf{B}\in M_{n,r}(\mathbb{K}),$ we have \[(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T.\] Indeed, by definition we have for all $1\leqslant i\leqslant r$ and all $1\leqslant j\leqslant m$ \[\left[(\mathbf{A}\mathbf{B})^T\right]_{ij}=[\mathbf{A}\mathbf{B}]_{ji}=\sum_{k=1}^n [\mathbf{A}]_{jk}[\mathbf{B}]_{ki}=\sum_{k=1}^n\left[\mathbf{B}^T\right]_{ik}\left[\mathbf{A}^T\right]_{kj}=\left[\mathbf{B}^T\mathbf{A}^T\right]_{ij}.\]

Transposes won’t play a very big role in this course, but they will be much more important in Linear Algebra II when you start studying bilinear mappings.

Inverses

Definition 2.21

Let $\mathbf{A}\in M_{m, n}(\mathbb{K})$ be a matrix. If there exists a matrix $\mathbf{B}\in M_{n, m}(\mathbb{K})$ with $\mathbf{A}\mathbf{B}= \mathbf{1}_{m}$ and $\mathbf{B}\mathbf{A}= \mathbf{1}_{n},$ then we say $\mathbf{A}$ is invertible.

If such a matrix exists (for a given $\mathbf{A}$) then it’s unique (see Exercises). So we can denote this unique matrix (if it exists!) by $\mathbf{A}^{-1},$ and we call it the inverse of $\mathbf{A}.$

Proposition 2.22

If $\mathbf{A}\in M_{m, n}(\mathbb{K})$ and $\mathbf{B}\in M_{n, r}(\mathbb{K})$ are both invertible, then $\mathbf{A}\mathbf{B}$ is invertible and $(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.$

Proof. We compute (using part (v) of Proposition 2.16 repeatedly) that $(\mathbf{B}^{-1} \mathbf{A}^{-1}) (\mathbf{A}\mathbf{B}) = \mathbf{B}^{-1} (\mathbf{A}^{-1} \mathbf{A}) \mathbf{B}= \mathbf{B}^{-1} \mathbf{1}_{n} \mathbf{B}= \mathbf{B}^{-1} \mathbf{B}= \mathbf{1}_{r},$ and similarly $(\mathbf{A}\mathbf{B}) \cdot (\mathbf{B}^{-1} \mathbf{A}^{-1}) = \mathbf{1}_{m}.$

Remark 2.23

You saw these definitions already in Algorithmics, assuming $\mathbb{K}= \mathbb{R}$ and, more importantly, that $m = n.$ The definition still makes logical sense if $m \ne n,$ but we’ll see later that it is never satisfied – it is a theorem that a non-square matrix cannot be invertible.

Exercises

See website https://apptest.fernuni.ch/ for worked solutions

Exercise 2.1

Find two $2 \times 2$ matrices $\mathbf{A}$ and $\mathbf{B}$ with $\mathbf{A}\mathbf{B}\ne \mathbf{B}\mathbf{A}.$

Solution

Consider $\mathbf{A}= \left(\begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix}\right)$ and $\mathbf{B}= \left(\begin{smallmatrix} 1 & 0 \\ 1 & 1 \end{smallmatrix}\right).$ Then we compute that \[\mathbf{A}\mathbf{B}= \left(\begin{smallmatrix}2 & 1 \\ 1 & 1 \end{smallmatrix}\right), \qquad \mathbf{B}\mathbf{A}= \left(\begin{smallmatrix}1 & 1 \\ 1 & 2 \end{smallmatrix}\right).\]

Exercise 2.2

Let $a,b,c,d \in \mathbb{K}$ and \[\mathbf{A}=\begin{pmatrix} a & b \\ c & d \end{pmatrix} \in M_{2,2}(\mathbb{K}).\] Show that $\mathbf{A}$ has an inverse $\mathbf{A}^{-1}$ if and only if $ad-bc\neq 0.$ For $ad-bc\neq 0,$ compute the inverse $\mathbf{A}^{-1}.$

Solution

If the matrix $\mathbf{A}$ had an inverse given by \[\mathbf{A}^{-1}=\begin{pmatrix} u & v \\ w & x \end{pmatrix},\] then $\mathbf{A}\mathbf{A}^{-1}=\mathbf{1}_{2},$ which amounts to the equations \[\begin{aligned} au+bw & = 1, \\ av+bx & = 0,\\ cu+dw & = 0,\\ cv+dx &= 1. \end{aligned}\] Multiplying the first equation by $c$ and using the third equation leads to $c=-(ad-bc)w.$ Similarly we find \[\begin{aligned} a& =(ad-bc)x,\\ b & =-(ad-bc)v,\\ c & =-(ad-bc)w,\\ d & =(ad-bc)u. \end{aligned}\] These equations imply that \[\begin{pmatrix}a & b \\ c & d\end{pmatrix}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix} = \begin{pmatrix}ad-bc & 0 \\ 0 & ad-bc\end{pmatrix},\] which shows that $\mathbf{A}$ is invertible if and only if $ad-bc\ne 0$ and in this case, the inverse is given by \[\mathbf{A}^{-1}= \frac{1}{ad-bc}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix}.\]

Exercise 2.3

Let $\mathbf{A}= (1 \quad 0) \in M_{1, 2}(\mathbb{R}).$

Find a matrix $\mathbf{B}\in M_{2, 1}(\mathbb{R})$ with $\mathbf{A}\mathbf{B}= \mathbf{1}_{1}.$
Is the matrix $\mathbf{B}$ from (i) uniquely determined?
Show that there does not exist a matrix $\mathbf{B}\in M_{2, 1}(\mathbb{R})$ with $\mathbf{B}\mathbf{A}= \mathbf{1}_{2}.$

(We say that $\mathbf{A}$ has a right inverse but not a left inverse.)

Solution

The matrix $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ will work.
No, because in fact the matrix $\begin{pmatrix} 1 \\ x \end{pmatrix}$ will work for any value of $x.$
If $\mathbf{B}= \begin{pmatrix} x \\ y \end{pmatrix},$ then we compute \[\mathbf{B}\mathbf{A}= \begin{pmatrix} x & 0 \\ y & 0 \end{pmatrix},\] so the right-hand column of $\mathbf{B}\mathbf{A}$ is always zero for any $\mathbf{B}$; hence this matrix cannot be equal to $\mathbf{1}_{2}.$

Exercise 2.4

Let $\mathbf{A}\in M_{m, n}(\mathbb{K}).$ Suppose there exist matrices $\mathbf{B},$ $\mathbf{B}' \in M_{n, m}(\mathbb{K})$ such that $\mathbf{A}\mathbf{B}= \mathbf{1}_{m}$ and $\mathbf{B}' \mathbf{A}= \mathbf{1}_{n}.$ Show that we must have $\mathbf{B}= \mathbf{B}'.$ (Hint: Consider $\mathbf{B}' \mathbf{A}\mathbf{B}.$) Hence show that the inverse of a matrix is unique if it exists.

Solution

We have \[\begin{aligned} \mathbf{B}&= \mathbf{1}_{n} \mathbf{B}\\ &= (\mathbf{B}' \mathbf{A}) \mathbf{B}\quad\text{(assumption on $\mathbf{B}'$)} \\ &= \mathbf{B}' (\mathbf{A}\mathbf{B})\quad\text{(associativity)} \\ &= \mathbf{B}' \mathbf{1}_{m} \quad\text{(assumption on $\mathbf{B}$)}\\ &= \mathbf{B}'.\quad \text{QED}. \end{aligned}\] In particular, if $\mathbf{B}$ and $\mathbf{B}'$ are both inverses of $\mathbf{A},$ then they satisfy $\mathbf{A}\mathbf{B}= \mathbf{1}_m$ and $\mathbf{B}' \mathbf{A}= \mathbf{1}_n,$ so we must have $\mathbf{B}' = \mathbf{B}.$

2 Matrices, I

2.1 Definitions

2.2 Arithmetic with matrices

Matrix addition

Scalar multiplication of a matrix

Matrix multiplication

Properties

2.3 Transpose and inverse

Transpose

Inverses

Exercises

Home

Chapters

Contents

PDFs

✕