4.3 Convergence in law

In this section we introduce the final notion of convergence of random variables in this course. We fix the dimension \(d \in \mathbb{N}^*\) throughout. We denote by \(C_b \equiv C_b(\mathbb{R}^d)\) the space of bounded continuous real-valued functions on \(\mathbb{R}^d.\)

Definition 4.11

  1. Let \(\mu_n,\) \(n \in \mathbb{N}^*,\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) We say that \(\mu_n\) converges weakly to \(\mu,\) denoted by \(\mu_n \overset{\mathrm w}{\longrightarrow} \mu,\) if \[\int \varphi \, \mathrm d\mu_n \longrightarrow \int \varphi \, \mathrm d\mu \,, \qquad \forall \varphi \in C_b\,.\]
  2. Let \(X_n,\) \(n \in \mathbb{N}^*,\) and \(X\) be random variables with values in \(\mathbb{R}^d.\) We say that \(X_n\) converges in law, or in distribution, to \(X,\) denoted by \(X_n \overset{\mathrm d}{\longrightarrow}X,\) if \[\mathbb{P}_{X_n} \overset{\mathrm w}{\longrightarrow} \mathbb{P}_X\,.\] Explicitly, this means that \[\mathbb{E}[\varphi(X_n)] \longrightarrow \mathbb{E}[\varphi(X)] \,, \qquad \forall \varphi \in C_b\,.\]
Remark 4.12

The convergence in law \(\overset{\mathrm d}{\longrightarrow}\) is very different in nature from the other modes of convergence \(\overset{\text{a.s.}}{\longrightarrow},\) \(\overset{\mathbb{P}}{\longrightarrow},\) \(\overset{L^2}{\longrightarrow}\) that we have seen up to now: it only pertains to the laws of the random variables. In particular, the random variables \(X_n\) and \(X\) can all be defined on different probability spaces. Moreover, the limit is (trivially) not unique: if \(X\) and \(Y\) are different random variables with the same law and \(X_n \overset{\mathrm d}{\longrightarrow}X\) then clearly also \(X_n \overset{\mathrm d}{\longrightarrow}Y.\) (In contrast, the limit of weak convergence of probability measures is unique.)

Example 4.13

  1. If \(a_n \to a\) then \(\delta_{a_n} \overset{\mathrm w}{\longrightarrow} \delta_a\) (by definition of continuity).
  2. If the law of \(X_n\) is uniform on \(\{\frac1n, \frac2n, \dots, \frac{n}{n}\}\) and the law of \(X\) is Lebesgue measure on \([0,1],\) then \(X_n \overset{\mathrm d}{\longrightarrow}X\) (by the Riemann sum approximation of integrals of continuous functions).
  3. Let \(\mu\) be a probability measure on \(\mathbb{R}\) and define the scaling function \(s^\eta(x) :=\eta x\) for \(\eta > 0.\) Then \(s^\eta_* \mu \overset{\mathrm w}{\longrightarrow}\delta_0\) as \(\eta \to 0.\) To show this, take a function \(\varphi \in C_b\) and write, using the change of variables \(x = s^\eta(y),\) \[\int \varphi(x) \, s^\eta_* \mu(\mathrm dx) = \int \varphi(s^\eta(y)) \, \mu(\mathrm dy) = \int \varphi(\eta y) \, \mu(\mathrm dy) \to \varphi(0)\] as \(\eta \to 0,\) by dominated convergence.
  4. In the important special case where \(\mu(\mathrm dx) = p(x) \, \mathrm dx\) has a density \(p,\) we have \[s^\eta_* \mu(\mathrm dx) = \frac{1}{\eta} p\biggl(\frac{x}{\eta}\biggr) \, \mathrm dx\,.\] The right-hand side is usually known as an approximate delta function. Such functions play a very important role in analysis. One such application is given in the Fourier inversion formula in 4.4.
Proposition 4.14

If \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) then \(X_n \overset{\mathrm d}{\longrightarrow}X.\)

Proof. We proceed by contradiction and suppose that \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) but \(X_n\) does not converge to \(X\) in law. The latter means that there exists \(\varphi \in C_b\) such that \(\mathbb{E}[\varphi(X_n)] \not\to \mathbb{E}[\varphi(X)].\) Hence, there exists a subsequence \((n_k)\) and \(\varepsilon> 0\) such that \[\tag{4.5} \bigl\lvert \mathbb{E}[\varphi(X_{n_k})] - \mathbb{E}[\varphi(X)] \bigr\rvert \geqslant\varepsilon\] for all \(k.\) Moreover, by Proposition 4.4 (ii), there exists a further subsequence \((n_{k_l})\) such that \(X_{n_{k_l}} \to X\) a.s. as \(l \to \infty.\) But by dominated convergence, we have \[\lvert \mathbb{E}[\varphi(X_{n_{k_l}})] - \mathbb{E}[\varphi(X)] \rvert \to 0\] as \(l \to \infty,\) in contradiction to (4.5).

Remark 4.15

The reverse implication of Proposition 4.14 is false. Worse: if \(X_n \overset{\mathrm d}{\longrightarrow}X\) then the very statement \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) is in general meaningless! This is because \(X_n \overset{\mathrm d}{\longrightarrow}X\) does not imply that \(X_n\) and \(X\) are all defined on the same probability space, while \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) requires all random variables to be defined on the same probability space (see Remark 4.12). Even when all random variables are defined on the same probability space, it is easy to think of counterexamples. For example, let \(X\) have a Bernoulli law with parameter \(p = 1/2\) and set \(X_n :=1 - X\) for all \(n.\) Then clearly \(\mathbb{P}_{X_n} = \mathbb{P}_X,\) so that \(X_n \overset{\mathrm d}{\longrightarrow}X,\) but because \(\lvert X - X_n \rvert = 1\) a.s., clearly \(X_n\) does not converge to \(X\) is probability.

However, if \(X_n \overset{\mathrm d}{\longrightarrow}a\) for some constant \(a\) then the implication \(X_n \overset{\mathbb{P}}{\longrightarrow} a\) does hold. To show this, let \(\varepsilon> 0\) and define the continuous bounded function \[\varphi(x) :=\frac{\lvert x - a \rvert}{\varepsilon} \wedge 1\,.\] (Plot this function!) Then \[\mathbb{P}(\lvert X_n - a \rvert > \varepsilon) = \mathbb{E}[\mathbf 1_{\lvert X_n - a \rvert > \varepsilon}] \leqslant\mathbb{E}[\varphi(X_n)] \to \varphi(a) = 0\] as \(n \to \infty,\) by assumption \(X_n \overset{\mathrm d}{\longrightarrow}a.\)

It turns out that weak convergence is a remarkably polyvalent concept, and there are many, very useful, equivalent criteria to characterise it. The following proposition is the first step in this direction. To state it, we use the notation \(C_c \equiv C_c(\mathbb{R}^d)\) to denote the space of continuous functions of compact support11. We recall the supremum norm \[\lVert \varphi \rVert_\infty :=\sup_{x \in \mathbb{R}^d} \lvert \varphi(x) \rvert\] for any \(\varphi \in C_b.\)
Proposition 4.16

Let \(H \subset C_c\) be such that the closure of \(H\) under \(\lVert \cdot \rVert_\infty\) contains \(C_c.\) Let \(\mu_n\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) Then the following are equivalent.

  1. \(\mu_n \overset{\mathrm w}{\longrightarrow}\mu\) (i.e. \(\forall \varphi \in C_b,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu\)).
  2. \(\forall \varphi \in C_c,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu.\)
  3. \(\forall \varphi \in H,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu.\)

Proof. The implications (i)\(\Rightarrow\)(ii) and (i)\(\Rightarrow\)(iii) are obvious. We shall show (ii)\(\Rightarrow\)(i) and (iii)\(\Rightarrow\)(ii).

To show (ii)\(\Rightarrow\)(i), suppose (ii). Let \(\varphi \in C_b.\) Choose a sequence \(f_k \in C_c\) such that \(0 \leqslant f_k \leqslant 1\) and \(f_k \uparrow 1\) as \(k \to \infty\) (you can take for instance \(f_k(x) = (1 - \lvert x/k \rvert)_+\)). Then we telescope \[\begin{aligned} \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu &= \int \varphi \, \mathrm d\mu_n - \int \varphi f_k \, \mathrm d\mu_n \\ &\quad+ \int \varphi f_k \, \mathrm d\mu_n - \int \varphi f_k \, \mathrm d\mu \\ &\quad+ \int \varphi f_k \, \mathrm d\mu - \int \varphi \, \mathrm d\mu\,, \end{aligned}\] and estimate each line on the right-hand side separately.

  • For any \(k \in \mathbb{N}^*,\) the second line tends to zero as \(n \to \infty,\) by assumption (ii) since \(\varphi f_k \in C_c.\)

  • For any \(k \in \mathbb{N}^*,\) the first line is estimated in absolute value by \[\lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu_n\biggr) \underset{n \to \infty}{\longrightarrow} \lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,,\] where we again used (ii) since \(f_k \in C_c.\)

  • The third line is estimated in absolute value by \[\lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,.\]

Putting everything together, we conclude that for any \(k \in \mathbb{N}^*\) we have \[\limsup_{n \to \infty} \biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu \biggr\rvert \leqslant 2 \lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,.\] Since \(k\) was arbitrary, we can take \(k \to \infty,\) under which the right-hand side tends to zero by dominated convergence. This concludes the proof of (ii)\(\Rightarrow\)(i).

To show (iii)\(\Rightarrow\)(ii), suppose (iii). Let \(\varphi \in C_c.\) Choose a sequence \(\varphi_k \in H\) such that \(\lVert \varphi_k - \varphi \rVert_{\infty} \to 0\) as \(k \to \infty.\) Then for any \(k \in \mathbb{N}^*\) we have, again by telescoping, \[\begin{aligned} &\limsup_{n \to \infty} \biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu \biggr\rvert \\ &\quad \leqslant \limsup_{n \to \infty} \Biggl(\biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi_k \, \mathrm d\mu_n \biggr\rvert + \biggl\lvert \int \varphi_k \, \mathrm d\mu_n - \int \varphi_k \, \mathrm d\mu \biggr\rvert + \biggl\lvert \int \varphi_k \, \mathrm d\mu - \int \varphi \, \mathrm d\mu \biggr\rvert\Biggr) \\ &\quad \leqslant 2 \lVert \varphi - \varphi_k \rVert_\infty \underset{k \to \infty}{\longrightarrow} 0\,, \end{aligned}\] where we used that for any \(k \in \mathbb{N}^*,\) the middle term on the second line tends to zero as \(n \to \infty\) by (iii). This concludes the proof of (iii)\(\Rightarrow\)(ii).

Home

Contents

Weeks