4 Convergence of random variables
In this chapter we study the convergence of random variables in detail. We shall study the most important notions of convergence: almost surely, in probability, in \(L^p,\) and in law.
4.1 Notions of convergence
Let \((X_n)_{n \in \mathbb{N}^*}\) and \(X\) be random variables with values in \(\mathbb{R}.\) In this section we wish to understand different notions of the convergence \(X_n \to X\) and any logical implications between them.
Recall that we have already seen two notions of convergence:
\(X_n \overset{\text{a.s.}}{\longrightarrow}X\) if \(\mathbb{P}(\lim_n X_n = X) = 1.\)
\(X_n \overset{L^p}{\longrightarrow}X\) if \(\lim_n \mathbb{E}[\lvert X_n - X \rvert^p] = 0.\)
The following definition is very useful, and specific to probability.
The random variables \(X_n\) converge in probability to \(X,\) denoted \(X_n \overset{\mathbb{P}}{\longrightarrow}X,\) if for all \(\varepsilon> 0\) we have \[\lim_n \mathbb{P}(\lvert X_n - X \rvert > \varepsilon) = 0\,.\]
It is often useful to observe that this notion of convergence is metrisable, i.e. it arises from a metric on the space of all random variables.
Let \(\mathcal L^0\) be the space of random variables on \((\Omega, \mathcal A, \mathbb{P})\) with values in \(\mathbb{R},\) and let \(L^0 :=\mathcal L^0 / \sim,\) where \(\sim\) is the equivalence relation defined by \(X \sim Y\) if and only if \(X = Y\) almost surely. For \(X,Y \in L^0\) we define \[d(X,Y) :=\mathbb{E}[\lvert X - Y \rvert \wedge 1]\,.\]
The space \((L^0, d)\) is a complete metric space, and \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) if and only if \(d(X_n, X) \to 0.\)
Proof. It is easy to check that \(d\) is a metric.
Let us now verify that \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) implies \(d(X_n, X) \to 0.\) Suppose that \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) and choose an arbitrary \(\varepsilon\in (0,1].\) Then \[\begin{gathered} d(X_n, X) = \mathbb{E}[\lvert X_n - X \rvert \wedge 1] = \mathbb{E}[\lvert X_n - X \rvert \, \mathbf 1_{\lvert X_n - X \rvert \leqslant\varepsilon}] + \mathbb{E}[(\lvert X_n - X \rvert \wedge 1) \, \mathbf 1_{\lvert X_n - X \rvert > \varepsilon}] \\ \leqslant\varepsilon+ \mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \longrightarrow \varepsilon\,, \end{gathered}\] by assumption. Since \(\varepsilon> 0\) was arbitrary, we conclude that \(d(X_n, X) \to 0.\)
Conversely, suppose that \(d(X_n, X) \to 0.\) Then for all \(\varepsilon\in (0,1]\) we have, by Chebyshev’s inequality, \[\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \leqslant\frac{1}{\varepsilon} \mathbb{E}[\lvert X_n - X \rvert \wedge 1] \rightarrow 0\,,\] i.e. \(X_n \overset{\mathbb{P}}{\longrightarrow}X.\)
All that remains, therefore, is to show that the metric space \((L^0,d)\) is complete. To that end, let \((X_n)\) be a Cauchy sequence for \(d(\cdot, \cdot).\) Choose a subsequence \(Y_k = X_{n_k}\) such that \(d(Y_k, Y_{k+1}) \leqslant 2^{-k}.\) We then use the Borel-Cantelli lemma (see also Remark 3.18) with \[\mathbb{E}\Biggl[\sum_{k = 1}^\infty (\lvert Y_{k+1} - Y_k \rvert \wedge 1)\Biggr] \leqslant\sum_{k = 1}^\infty 2^{-k} < \infty\,,\] so that \[\sum_{k = 1}^\infty (\lvert Y_{k+1} - Y_k \rvert \wedge 1) < \infty \quad \text{a.s.}\,,\] which implies \[\sum_{k = 1}^\infty \lvert Y_{k+1} - Y_k \rvert < \infty \quad \text{a.s.}\,.\] Defining \[X :=Y_1 + \sum_{k = 1}^\infty (Y_{k+1} - Y_k)\,,\] we therefore have \(Y_k \overset{\text{a.s.}}{\longrightarrow} X\) as \(k \to \infty.\) Hence, \[d(Y_k, X) = \mathbb{E}[\lvert Y_k - X \rvert \wedge 1] \longrightarrow 0\] as \(k \to \infty,\) by dominated convergence. We conclude that \(d(X_n, X) \to 0\) as \(n \to \infty.\)
The argument in the preceding proof of completeness is a general and important fact from probability and measure theory: convergence in probability does not in general imply almost sure convergence, but it does so provided that we restrict ourselves to a suitable subsequence. This is made precise in the following proposition.
Let \(X_n, X\) be random variables with values in \(\mathbb{R}.\)
- If \(X_n \overset{\text{a.s.}}{\longrightarrow} X\) or \(X_n \overset{L^p}{\longrightarrow} X\) then \(X_n \overset{\mathbb{P}}{\longrightarrow} X.\)
- If \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) then there exists a subsequence \((X_{n_k})\) such that \(X_{n_k} \overset{\text{a.s.}}{\longrightarrow} X.\)
Proof. Part (ii) was already established in the proof of Proposition 4.3. For part (i), if \(X_n \overset{\text{a.s.}}{\longrightarrow} X\) then \(\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) = \mathbb{E}[\mathbf 1_{\lvert X_n - X \rvert > \varepsilon}] \to 0\) by dominated convergence, and if \(X_n \overset{L^p}{\longrightarrow} X\) then \(\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \leqslant\frac{1}{\varepsilon^p} \mathbb{E}[\lvert X_n - X \rvert^p] \to 0\) for any \(\varepsilon> 0.\)
In Proposition 4.4 (ii), it is in general necessary to take a subsequence; see Remark 3.26. (In this example, after taking a subsequence we can ensure that \(\sum_{n} b_n < \infty.\))