3 Independence

This chapter is devoted to independence, which is a fundamentally probabilistic notion. Although the basic idea behind independence is very simple, a precise statement general enough for later applications requires some care.

3.1 Independent events

Independence is classically stated on the level of events. In the real world, two events are typically independent if they describe events that are causally unrelated. For instance, if I flip a coin twice, whether I geat heads the first and the second time are independent events. The mathematical definition of course goes beyond any causal or mechanical interpretations in the real world. Informally, \(A\) and \(B\) being independent means that knowing that \(B\) happened gives no information about the probability of \(A\) happening. More formally, the conditional probability (see Definition 2.12) \(\mathbb{P}(A \mid B)\) is equal to \(\mathbb{P}(A).\) This leads to the following definition.

Definition 3.1

Two events \(A,B \in \mathcal A\) are independent if \[\mathbb{P}(A \cap B) = \mathbb{P}(A) \cdot \mathbb{P}(B)\,.\]

Example 3.2

  1. (Example 2.8 continued.) When throwing a die twice, obtaining a six on the first throw and obtaining a six on the second throw are independent events. More precisely, setting \[A = \{6\} \times \{1, \dots, 6\}\,, \qquad B :=\{1, \dots, 6\} \times \{6\}\,,\] we find \(\mathbb{P}(A \cap B) = \frac{1}{36} = \mathbb{P}(A) \cdot \mathbb{P}(B).\)

  2. When throwing a single die, the events \[A = \{1,2\} \,, \qquad B = \{1,3,5\}\] are independent: \(\mathbb{P}(A \cap B) = \frac{1}{6} = \mathbb{P}(A) \cdot \mathbb{P}(B).\)

The notion of independence extends from two events to any, possibly infinite, collection of events.

Definition 3.3

A collection of events \(\{A_i\}_{i \in I}\) is independent if for any finite \(J \subset I\) we have \[\mathbb{P}\biggl(\bigcap_{i \in J} A_i\biggr) = \prod_{i \in J} \mathbb{P}(A_i)\,.\]

Remark 3.4

Even if \(I\) is finite, for the collection \(\{A_i\}_{i \in I}\) to be independent, it is not sufficient that \(\mathbb{P}\bigl(\bigcap_{i \in I} A_i\bigr) = \prod_{i \in I} \mathbb{P}(A_i).\)

Moreover, for the collection \(\{A_i\}_{i \in I}\) to be independent, it is not sufficient that all pairs \(A_i\) and \(A_j\) be independent (pairwise independence).

To see this, consider flipping an unbiased coin twice, so that \(\Omega = \{0,1\}^2\) with the uniform probability measure. Define the events \[A = \{1\} \times \{0,1\} \,, \qquad B = \{0,1\} \times \{1\}\,, \qquad C = \{0\} \times \{0\} \cup \{1\} \times \{1\}\,.\] (What is their interpretation?) Then we have \[\begin{aligned} \mathbb{P}(A) = \mathbb{P}(B) = \mathbb{P}(C) &= \frac{1}{2}\,, \\ \mathbb{P}(A \cap B) = \mathbb{P}(A \cap C) = \mathbb{P}(B \cap C) &= \frac{1}{4}\,, \\ \mathbb{P}(A \cap B \cap C) &= \frac{1}{4}\,. \end{aligned}\] We conclude that they are not independent, although they are pairwise independent.

3.2 Intermezzo: monotone class lemma*

In order to extend the notion of independence to random variables, a notion that plays a fundamental role in probability, we shall need a powerful tool from measure theory: the monotone class lemma. It is usually not covered in a course in measure theory. Thus, in this section we perform a measure-theoretic excursion. The section is marked with an asterisk, which means that it does not belong to the core material of the course; in particular, if you wish you can skip over the proofs, which will also not be asked in the exam. All that you have to know from this section is Definition 3.7 and Corollary 3.9.

Let \(E\) be a set.

Definition 3.5

A collection \(\mathcal M \subset \mathcal P(E)\) is a monotone class if

  1. \(E \in \mathcal M\);

  2. If \(A,B \in \mathcal M\) satisfy \(A \subset B\) then \(B \setminus A \in \mathcal M\);

  3. If \(A_n \in \mathcal M\) and \(A_n \subset A_{n+1}\) for all \(n \in \mathbb{N}\) then \(\bigcup_{n \in \mathbb{N}} A_n \in \mathcal M.\)

The term monotone class comes from the last property, which distinguishes it from a \(\sigma\)-algebra, and states that the union of an increasing family of elements of \(\mathcal M\) is again in \(\mathcal M.\) There is a priori no very clear intuitive interpretation of this definition. Its usefulness will become apparent a posteriori, through its applications; see for instance Corollary 3.9 and Example 3.10 below.

Similarly to Definition 1.2, any collection of subsets of \(E\) generates a monotone class.

Definition 3.6

The monotone class generated by a collection \(\mathcal C \subset \mathcal P(E)\) is \[\mathcal M(\mathcal C) :=\bigcap_{\substack{\mathcal M \text{ is a monotone class}\\ \mathcal C \subset \mathcal M}} \mathcal M\,.\]

It is left as an exercise to check that the intersection of monotone classes is a monotone class, so that in particular \(\mathcal M(\mathcal C)\) is always a monotone class.

The following result is the main tool proved in this section. To state it, we need the following definition.

Definition 3.7

A collection \(\mathcal C \subset \mathcal P(E)\) is stable under finite intersections if for any \(A, B \in \mathcal C\) we have \(A \cap B \in \mathcal C.\)

Proposition 3.8 • Monotone class lemma

If \(\mathcal C \subset \mathcal P(E)\) is stable under finite intersections, then \(\mathcal M(\mathcal C) = \sigma(\mathcal C).\)

Proof. Note first that a \(\sigma\)-algebra is a monotone class (this is left as an easy exercise). Hence, we trivially have the inclusion \(\mathcal M(\mathcal C) \subset \sigma(\mathcal C).\)

To prove the reverse inclusion, \(\sigma(\mathcal C) \subset \mathcal M(\mathcal C),\) it suffices to show that \(\mathcal M(\mathcal C)\) is a \(\sigma\)-algebra7.

We shall proceed in several steps.

Claim. A monotone class \(\mathcal M\) is a \(\sigma\)-algebra if and only if it is stable under finite intersections.

Clearly, a \(\sigma\)-algebra is a monotone class that is stable under finite intersections. For the reverse implication, suppose that \(\mathcal M\) is a monotone class that is stable under finite intersections. Then \(\mathcal M\) is also stable under finite unions, since \[A, B \in \mathcal M \; \Rightarrow\; A^c, B^c \in \mathcal M \; \Rightarrow\; A^c \cap B^c \in \mathcal M \; \Rightarrow\; A \cup B \in \mathcal M\,.\] Let now \(A_1, A_2, \dots \in \mathcal M\) and set \(B_n :=A_1 \cup \cdots \cup A_n.\) Then, by the property we just proved, \(B_n \in \mathcal M.\) Moreover, since \(B_n \subset B_{n+1},\) by Definition 3.5 we conclude that \(\bigcup_{n}A_n = \bigcup_n B_n \in \mathcal M.\) We have therefore verified Definition 1.1, and hence proved the Claim.

By the Claim, it suffices to show that \(\mathcal M(\mathcal C)\) is stable under finite intersections, i.e. \[\tag{3.1} A,B \in \mathcal M(\mathcal C) \quad \Longrightarrow \quad A \cap B \in \mathcal M(\mathcal C)\,.\] To that end, we first fix \(A \in \mathcal C,\) and define \[\mathcal M_1 :=\{B \in \mathcal M(\mathcal C) \,\colon A \cap B \in \mathcal M(\mathcal C)\}\,.\] Since by assumption \(\mathcal C\) is stable under finite intersections, we have \[\tag{3.2} \mathcal C \subset \mathcal M_1\,.\] Moreover, we claim that \[\tag{3.3} \mathcal M_1 \text{ is a monotone class}.\] To verify (3.3), let us verify the three properties (i)–(iii) of Definition 3.5. Property (i) is trivial. To verify (ii), we take \(B, B' \in \mathcal M_1\) satisfying \(B \subset B',\) and note that \[A \cap (B' \setminus B) = (A \cap B') \setminus (A \cap B) \in \mathcal M(\mathcal C)\,,\] where the last step follows from the facts that \(\mathcal M(\mathcal C)\) is a monotone class and that \(A \cap B'\) and \(A \cap B\) are in \(\mathcal M(\mathcal C)\) by definition of \(\mathcal M_1.\) This shows that \(B' \setminus B \in \mathcal M_1,\) and hence yields (ii). Finally, to prove (iii), let us take \(B_n \in \mathcal M_1\) such that \(B_n \subset B_{n+1}.\) Then \[A \cap \biggl(\bigcup_n B_n\biggr) = \bigcup_n (A \cap B_n) \in \mathcal M(\mathcal C)\,,\] since \(A \cap B_n \in \mathcal M(\mathcal C)\) by definition of \(\mathcal M_1,\) and \(\mathcal M(\mathcal C)\) is a monotone class. We conclude that \(\bigcup_n B_n \in \mathcal M_1.\) This concludes the proof of property (iii), and hence of (3.3).

Next, from (3.2) and (3.3), we deduce that \(\mathcal M(\mathcal C) \subset \mathcal M_1.\) This means that \[\tag{3.4} \forall A \in \mathcal C, \forall B \in \mathcal M(\mathcal C), A \cap B \in \mathcal M(\mathcal C)\,.\]

Next, we repeat exactly the same argument with fixed \(B \in \mathcal M(\mathcal C)\) and \[\mathcal M_2 :=\{A \in \mathcal M(\mathcal C) \,\colon A \cap B \in \mathcal M(\mathcal C)\}\,.\] From (3.4) we know that \(\mathcal C \subset \mathcal M_2.\)

We may repeat the proof of (3.3) almost to the letter to show that \(\mathcal M_2\) is a monotone class. Since \(\mathcal C \subset \mathcal M_2,\) we conclude that \(\mathcal M(\mathcal C) \subset \mathcal M_2.\) This immediately implies (3.1), and hence concludes the proof.

The monotone class lemma may seem rather abstract, but it is very useful in probability. It allows to verify equality of two probability measures \(\mu\) and \(\nu\) on a much smaller set \(\mathcal C\) of events than the full \(\sigma\)-algebra. Typically, verifying the equality \(\mu(A) = \nu(A)\) for all \(A \in \mathcal A\) is practically impossible. However, it is often very easy to construct a simple collection of events \(\mathcal C\) (for instance intervals, rectangles, or cylinder sets) on which the equality is trivial. The monotone class lemma then allows to deduce equality on all sets \(A \in \mathcal A.\) That is its great power. The following result is a typical application of this idea.

Corollary 3.9

Let \(\mu\) and \(\nu\) be two probability measures on \((\Omega, \mathcal A).\) If there exists a collection \(\mathcal C \subset \mathcal A\) that is stable under finite intersections such that \(\sigma(\mathcal C) = \mathcal A\) and \(\mu(A) = \nu(A)\) for all \(A \in \mathcal C,\) then \(\mu = \nu.\)

Proof. Let \(\mathcal G :=\{A \in \mathcal A \,\colon\mu(A) = \nu(A)\}.\) Then \(\mathcal C \subset \mathcal G\) and it is easy to check that \(\mathcal G\) is a monotone class. Moreover, by Proposition 3.8, \[\mathcal M(\mathcal C) = \sigma(\mathcal C) = \mathcal A\,,\] and the claim follows since \(\mathcal M(\mathcal C) \subset \mathcal G.\)

We shall use Corollary 3.9 throughout this class. Proposition 3.12 below is a typical application. Here is an immediate application that shows its power in proving a famous and nontrivial result.

Example 3.10 • Uniqueness of Lebesgue measure

There exists at most one probability measure \(\lambda\) on \(([0,1], \mathcal B([0,1]))\) such that \(\lambda((a,b]) = b-a\) for all \(0 < a < b \leqslant 1.\) For the proof, simply invoke Corollary 3.9 with \(\mathcal C = \{(a,b] \,\colon 0 < a < b \leqslant 1\},\) the set of half-open intervals (which is obviously stable under finite intersections).

3.3 Independent \(\sigma\)-algebras and random variables

On the most fundamental, and general, level, independence is formulated for \(\sigma\)-algebras. This notion then naturally extends to random variables through their generated \(\sigma\)-algebras (Definition 2.35).

Definition 3.11

  1. The \(\sigma\)-algebras \(\mathcal B_1, \dots, \mathcal B_n \subset \mathcal A\) are independent if for all \(A_1 \in \mathcal B_1, \dots, A_n \in \mathcal B_n\) we have \(\mathbb{P}(A_1 \cap \dots \cap A_n) = \mathbb{P}(A_1) \cdots \mathbb{P}(A_n).\)

  2. The random variables \(X_1, \dots, X_n\) are independent if \(\sigma(X_1), \dots, \sigma(X_n)\) are independent.

Explicitly, recalling Definition 2.35, we see that (ii) means that for all \(F_1 \in \mathcal E_1, \dots, F_n \in \mathcal E_n\) we have8 \[\tag{3.5} \mathbb{P}(X_1 \in F_1, \dots, X_n \in F_n) = \mathbb{P}(X_1 \in F_1) \cdots \mathbb{P}(X_n \in F_n)\,,\] where \(X_i\) takes values in the measurable space \((E_i, \mathcal E_i).\)

The following result is very convenient: it gives a concrete characterisation of independence of random variables that is very useful when working with independent random variables.

Proposition 3.12

The random variables \(X_1, \dots, X_n\) are independent if and only if the law of \((X_1, \dots, X_n)\) is the product of the laws of \(X_1,\) …, \(X_n,\) i.e. \[\tag{3.6} \mathbb{P}_{(X_1, \dots, X_n)} = \mathbb{P}_{X_1} \otimes \cdots \otimes \mathbb{P}_{X_n}\,.\] In this case we have \[\mathbb{E}[f_1(X_1) \cdots f_n(X_n)] = \mathbb{E}[f_1(X_1)] \cdots \mathbb{E}[f_n(X_n)]\] for any measurable and nonnegative functions \(f_i.\)

Proof. Let \((E_i, \mathcal E_i)\) be the target space of \(X_i.\) Let \(F_i \in \mathcal E_i\) for all \(i.\) Then we have \[\begin{aligned} \mathbb{P}_{(X_1, \dots, X_n)}(F_1 \times \cdots \times F_n) &= \mathbb{P}(X_1 \in F_1, \dots, X_n \in F_n)\,, \\ \mathbb{P}_{X_1} \otimes \cdots \otimes \mathbb{P}_{X_n}(F_1 \times \cdots \times F_n) &= \mathbb{P}(X_1 \in F_1) \cdots \mathbb{P}(X_n \in F_n)\,. \end{aligned}\] Using (3.5), we conclude that \(X_1, \dots, X_n\) are independent if and only if \(\mathbb{P}_{(X_1, \dots, X_n)}\) and \(\mathbb{P}_{X_1} \otimes \cdots \otimes \mathbb{P}_{X_n}\) coincide on all rectangles of the form \(F_1 \times \dots \times F_n.\) The family of such rectangles, \[\mathcal C = \{F_1 \times \dots \times F_n \,\colon F_i \in \mathcal E_i \, \forall i\}\] is stable under finite intersections (Definition 3.7) and it satisfies \(\sigma(\mathcal C) = \mathcal E_1 \otimes \cdots \otimes \mathcal E_n\) (recall Example 1.3 (ii)). By Corollary 3.9 we therefore conclude that \(X_1, \dots, X_n\) are independent if and only if \(\mathbb{P}_{(X_1, \dots, X_n)} = \mathbb{P}_{X_1} \otimes \cdots \otimes \mathbb{P}_{X_n}.\)

For the last statement, we use the Fubini-Tonelli theorem (Proposition 1.14) to conclude \[\begin{gathered} \mathbb{E}\Biggl[\prod_i f_i(X_i)\Biggr] = \int \prod_i f_i(x_i) \, \mathbb{P}_{X_1}(\mathrm dx_1) \cdots \mathbb{P}_{X_n}(\mathrm dx_n) \\ = \prod_i \int f_i(x_i) \, \mathbb{P}_{X_i}(\mathrm dx_i) = \prod_i \mathbb{E}[f_i(X_i)]\,. \end{gathered}\]

Proposition 3.12 shows how to construct independent random variables \(X_1, \dots, X_n\) with given laws \(\mu_1, \dots, \mu_n\) on the spaces \((E_1, \mathcal E_1), \dots (E_n, \mathcal E_n),\) respectively. Indeed, simply choose \(\Omega = E_1\times \cdots \times E_n,\) \(\mathcal A = \mathcal E_1 \otimes \dots \otimes \mathcal E_n,\) \(\mathbb{P}= \mu_1 \otimes \dots \otimes \mu_n,\) and set \(X_i(\omega_1, \dots, \omega_n) :=\omega_i.\) Clearly, (3.6) holds.

Let us record some obvious but important properties of independent random variables.

Remark 3.13

  1. If \(X_1, \dots, X_n\) are independent random variables with values in \(\mathbb{R},\) then \(\mathbb{E}[X_1 \cdots X_n] = \mathbb{E}[X_1] \cdots \mathbb{E}[X_n]\) provided that \(\mathbb{E}[\lvert X_i \rvert] < \infty\) for all \(i.\) In particular, if \(X_1, \dots, X_n \in L^1\) then \(X_1 \cdots X_n \in L^1.\) Without independence, this is in general false. For instance, for \(X_1 = X_2 = X \in L^1\) in general we have \(X \notin L^2\) (i.e. \(X^2 \notin L^1\)).

  2. If \(X_1, X_2 \in L^2\) are independent then \(\mathop{\mathrm{Cov}}(X_1, X_2) = 0.\) In words: independent random variables are uncorrelated. The reverse implication (uncorrelated random variables are independent) is in general wrong.

Example 3.14 To illustrate (ii), consider a random variable \(X_1 \in L^2\) on \(\mathbb{R}\) with a symmetric density \(p,\) i.e. \(p(x) = p(-x).\) Let \(\chi \in \{\pm1\}\) be a random variable with law \(\mathbb{P}(\chi = +1) = \mathbb{P}(\chi = -1) = \frac{1}{2}.\) Let \(X_1\) and \(\chi\) be independent. Define \(X_2 :=\chi \cdot X_1.\) Then we have \[\mathop{\mathrm{Cov}}(X_1, X_2) = \mathbb{E}[X_1 X_2] = \mathbb{E}[\chi X_1^2] = \mathbb{E}[\chi] \mathbb{E}[X_1^2] = 0\,,\] so that \(X_1\) and \(X_2\) are uncorrelated. Nevertheless, \(X_1\) and \(X_2\) are not independent. Indeed, if they were independent, then \(\lvert X_1 \rvert\) and \(\lvert X_2 \rvert = \lvert X_1 \rvert\) would also be independent, but a random variable that is independent of itself is necessarily constant9. Clearly, \(\lvert X_1 \rvert\) cannot be constant, since it has density \(2 p(x) \mathbf 1_{x \geqslant 0}.\)

Remarkably, if the law of \((X_1, X_2)\) is Gaussian, then independence of \(X_1\) and \(X_2\) is equivalent to them being uncorrelated. This is a consequence of Wick’s theorem (Exercise 3.2).

Remark 3.15

Let \(X, Y, Z\) be independent random variables. Then \(X\) and \(f(Y,Z)\) are independent for any measurable function \(f.\) Indeed, by Proposition 3.12, \[\begin{gathered} \mathbb{P}(X \in A, f(Y,Z) \in B) = (\mathbb{P}_X \otimes \mathbb{P}_Y \otimes \mathbb{P}_Z)(A \times f^{-1}(B)) \\ = \mathbb{P}_X(A) \cdot \mathbb{P}_Y \otimes \mathbb{P}_Z(f^{-1}(B)) = \mathbb{P}(X \in A) \cdot\mathbb{P}(f(Y,Z) \in B)\,. \end{gathered}\] This principle of regrouping random variables can easily be generalised in the obvious way to more random variables.

Home

Contents

Study Weeks