The origin of this term is rather convoluted. In English, a portmanteau is traditionally a large suitcase made of leather, that opens into two equal parts, usually used to transport coats. (Confusingly, although the word is obviously of French origin, in French the word portemanteau means something altogether different: a standing piece of furniture on which one can hang coats.) In his 1871 work Through the Looking Glass, the sequel to Alice in Wonderland, Lewis Carroll coined the term portmanteau to denote a word that has been obtained by gluing pieces of other words together (such as “motel” from “motor” and “hotel”). It is in this spirit that the following proposition is to be understood, although admittedly the link is not obvious.
Before stating the portmanteau theorem, we remark that \(\mu_n \overset{\mathrm w}{\longrightarrow}\mu\) does in general not imply that \(\mu_n(B) \to \mu(B)\) for all \(B \in \mathcal B(\mathbb{R}^d).\) A counterexample is provided by Example 4.13 (i): \(\delta_{1/n} \overset{\mathrm w}{\longrightarrow}\delta_0,\) but \(\delta_{1/n}(\{0\}) = 0,\) which does not converge to \(\delta_0(\{0\}) = 1.\) We shall see that weak convergence is equivalent to convergence on a subset of sets \(B \in \mathcal B(\mathbb{R}^d),\) namely those whose boundary \(\partial B = \bar B \setminus \overset{\circ}{B}\) has limiting measure zero.
Let \(\mu_n\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) Then the following are equivalent.
- \(\mu_n \overset{\mathrm w}{\longrightarrow}\mu.\)
- For any open \(G \subset \mathbb{R}^d,\) \(\liminf_{n} \mu_n(G) \geqslant\mu(G).\)
- For any closed \(F \subset \mathbb{R}^d,\) \(\limsup_n \mu_n(F) \leqslant\mu(F).\)
- For any \(B \in \mathcal B(\mathbb{R}^d)\) such that \(\mu(\partial B) = 0,\) \(\lim_n \mu_n(B) = \mu(B).\)
Proof. We prove the following implications.
(i)\(\Rightarrow\)(ii). Let \(G\) be open. Then there exists a sequence \(\varphi_k \in C_b\) such that \(0 \leqslant\varphi_k \leqslant\mathbf 1_{G}\) and \(\varphi_k \uparrow \mathbf 1_{G}.\) For instance, we can take \[\varphi_k(x) :=(k \mathop{\mathrm{dist}}(x, G^c)) \wedge 1\,.\] (Note that the property \(\varphi_k \uparrow \mathbf 1_{G}\) holds because \(G\) is open.) Since \(\varphi_k \leqslant\mathbf 1_{G},\) we find \[\liminf_n \mu_n(G) \geqslant\sup_k \biggl(\liminf_n \int \varphi_k \, \mathrm d\mu_n\biggr) = \sup_k \int \varphi_k \, \mathrm d\mu = \mu(G)\,,\] where the last step follows by monotone convergence.
(ii)\(\Leftrightarrow\)(iii). This is obvious by taking \(F = G^c.\)
(ii), (iii)\(\Rightarrow\)(iv). Let \(B \in \mathcal B(\mathbb{R}^d).\) Then by (iii) we have \[\limsup_n \mu_n(B) \leqslant\limsup_n \mu_n(\bar B) \leqslant\mu(\bar B)\] and by (ii) we have \[\liminf_n \mu_n(B) \geqslant\liminf_n \mu_n(\overset{\circ}{B}) \geqslant\mu(\overset{\circ}{B})\,.\] If \(\mu(\partial B) = 0\) then \(\mu(\bar B) = \mu(\overset{\circ}{B}) = \mu(B),\) and we conclude (iv).
(iv)\(\Rightarrow\)(i). This is the last remaining implication. Let \(\varphi \in C_b\) and suppose without loss of generality that \(\varphi \geqslant 0\) (otherwise split \(\varphi = \varphi_+ - \varphi_-\) with \(\varphi_+, \varphi_- \geqslant 0\)). With \(K :=\sup_x \varphi(x)\) we have (recall Exercise 2.4) \[\int \varphi(x) \, \mu(\mathrm dx) = \int \int_0^K \mathbf 1_{t \leqslant\varphi(x)} \, \mathrm dt \, \mu(\mathrm dx) = \int_0^K \mu(E_t^\varphi) \, \mathrm dt\,,\] where in the last step we used Fubini’s theorem and defined \[E^\varphi_t :=\{x \in \mathbb{R}^d \,\colon\varphi(x) \geqslant t\}\,.\] By the same argument, \[\int \varphi(x) \, \mu_n(\mathrm dx) = \int_0^K \mu_n(E_t^\varphi) \, \mathrm dt\,.\] To conclude the argument, we make two claims.
First, \(\partial E_t^\varphi \subset \{x \in \mathbb{R}^d \,\colon\varphi(x) = t\}.\) To see this, we note that since \(\varphi\) is continuous, \(E^\varphi_t\) is closed as the preimage of a closed set. Moreover, \[\overset{\circ}{E}{}^\varphi_t \supset \{x \in \mathbb{R}^d \,\colon\varphi(x) > t\}\,,\] since the right-hand side is open (as the preimage of an open set) and contained in \(E^\varphi_t.\) Hence, \[\partial E^\varphi_t = E^\varphi_t \setminus \overset{\circ}{E}{}^\varphi_t \subset \{x \in \mathbb{R}^d \,\colon\varphi(x) = t\}\,,\] as claimed.
Second, the set \[\bigl\{t \in [0,K] \,\colon\mu(\{x \,\colon\varphi(x) = t\}) > 0\bigr\}\] is at most countable. This follows from the observation that this set can be written as \[\bigcup_{k \geqslant 1} \biggl\{t \in [0,K] \,\colon\mu(\{x \,\colon\varphi(x) = t\}) \geqslant\frac{1}{k}\biggr\}\,,\] and for each \(k \geqslant 1\) the set on the right-hand side is a set of cardinality at most \(k\) (recall that \(\mu\) has total measure \(1\)), in particular finite.
Putting both claims together, we use (iv) to conclude that \(\mu_n(E_t^\varphi) \to \mu(E^\varphi_t)\) as \(n \to \infty\) for almost all \(t.\) Hence, by dominated convergence we have \[\int \varphi(x) \, \mu_n(\mathrm dx) = \int_0^K \mu_n(E_t^\varphi) \, \mathrm dt \longrightarrow \int_0^K \mu(E_t^\varphi) \, \mathrm dt= \int \varphi(x) \, \mu(\mathrm dx)\,,\] as desired.
As a corollary of the portmanteau theorem, we deduce yet another criterion for convergence in law on \(\mathbb{R}\): pointwise convergence of the distribution function at its points of continuity (think again of Example Example 4.13 (i) for why the last condition is needed).
Let \(X_n, X\) be real-valued random variables. Then \(X_n \overset{\mathrm d}{\longrightarrow}X\) if and only if \(F_{X_n}(x) \to F_X(x)\) for all \(x\) where \(F_X\) is continuous.
Proof. The “only if” implication is immediate from Proposition 4.17 (iv). Indeed, by Proposition 4.17 (iv), convergence in law implies that \[F_{X_n}(x) = \mathbb{P}(X_n \leqslant x) = \mathbb{P}_{X_n}((-\infty, x]) \longrightarrow \mathbb{P}_{X}((-\infty, x]) = \mathbb{P}(X \leqslant x) = F_{X}(x)\] for all \(x \in \mathbb{R}\) such that \(\mathbb{P}_{X}(\{x\}) = \mathbb{P}(X = x) = 0\) (since \(\partial (-\infty, x] = \{x\}\)). Moreover, if \(F\) is continuous at \(x\) it means that \(\lim_{n \to \infty} F(x - 1/n) = F(x),\) which implies that \(\mathbb{P}(X = x) = \mathbb{P}(X \leqslant x) - \mathbb{P}(X < x) = 0.\)
For the “if” implication, we abbreviate \(\mu :=\mathbb{P}_X\) and \(F :=F_{X}\) as well as \(\mu_n :=\mathbb{P}_{X_n}\) and \(F_n :=F_{X_n}.\) First we claim that the set \(D\) of points of discontinuity of \(F\) is at most countable. This is an exercise in real analysis that we recall here. Since \(F\) is right-continuous and nondecreasing, at any \(x \in D\) we have \(F(x-) :=\lim_{y \uparrow x}F(y) < F(x).\) Hence, there exists \(q(x) \in \mathbb{Q}\cap (F(x-), F(x)).\) By monotonicity of \(F,\) the map \(q \,\colon D \to \mathbb{Q}\) is injective, which proves the claim. In particular, the set \(\mathbb{R}\setminus D\) of points of continuity of \(F\) is dense in \(\mathbb{R}.\)
Next, by right-continuity and by definition of \(F(x-),\) for any \(x \in \mathbb{R}\) and any \(\varepsilon> 0,\) there exists \(\delta > 0\) such that \[F(x + \delta) \leqslant F(x) + \varepsilon\,, \qquad F(x - \delta) \geqslant F(x-) - \varepsilon\,.\] Choosing \(a,b \in \mathbb{R}\setminus D\) satisfying \(x - \delta \leqslant a \leqslant x \leqslant b \leqslant x + \delta\) (which is possible by density of \(\mathbb{R}\setminus D\)), we have, by assumption and by monotonicity of \(F,\) \[\lim_n F_n(a) = F(a) \geqslant F(x - \delta) \geqslant F(x-) - \varepsilon\,,\] which implies \[\liminf_n F_n(x-) \geqslant\liminf_n F_n(a) \geqslant F(x-) - \varepsilon\,.\] Since \(\varepsilon> 0\) was arbitrary, we conclude that \[\tag{4.6} \liminf_n F_n(x-) \geqslant F(x-)\,.\]
Let us repeat the same argument for the \(\limsup\): \[\lim_n F_n(b) = F(b) \leqslant F(x + \delta) \leqslant F(x) + \varepsilon\,,\] which implies \[\limsup_n F_n(x) \leqslant\limsup_n F_n(b) \leqslant F(x) + \varepsilon\,.\] Since \(\varepsilon> 0\) was arbitrary, we conclude that \[\tag{4.7} \limsup_n F_n(x) \leqslant F(x)\,.\]
To obtain the general case, we recall from analysis that any open set \(G\) can be written as a countable disjoint union of open intervals \(I_k,\) i.e. \(G = \bigcup_{k \geqslant 1} I_k.\) (For the proof, we simply decompose \(G\) into its connected components, which are intervals, and note that, since each such interval contains a point in \(\mathbb{Q}\) unique to that interval, there are at most countably many intervals.) Hence, \[\begin{gathered} \liminf_n \mu_n(G) = \liminf_n \mu_n \biggl(\bigcup_{k \geqslant 1} I_k\biggr) = \liminf_n \sum_{k \geqslant 1} \mu_n(I_k) \\ \geqslant\sum_{k \geqslant 1} \liminf_n \mu_n(I_k) \geqslant \sum_{k \geqslant 1} \mu(I_k) = \mu(G)\,, \end{gathered}\] where in the third step we used Fatou’s lemma (for the counting measure \(\sum_{k \geqslant 1}\)), and in the fourth step we used (4.8). We have therefore proved Proposition 4.17 (ii) for a general open set \(G.\)