Wiki
Wiki
Courses and public working groups
Courses and public working groups
Private Working Groups
Private Working Groups
- New!!! Reading Group
- Theatre
- Admin
- Research
- Teaching
This is an old revision of the document!
For adding a newpage here, choose the namespace (world, here) and the name of your newpage.
You are not allowed to add pages$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lp}[1]{\mathsf{L}_{{#1}}} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcD}{\mathcal{D}} \newcommand{\mcf}{\mathcal{F}} \newcommand{\mcl}{\mathcal{L}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$
Definition: Let $K$ be convex. We say that $x \in K$ is an extreme point if $$ \forall y,z \in K,\ \forall \lambda \in [0,1],\ x = \lambda y + (1-\lambda)z \Rightarrow \lambda \in \{0,1\}. $$ We denote by $\mathrm{Extr}(K)$ the set of extreme points.
Theorem (Krein–Milman). Let $E$ be an Euclidean space of dimension $n$ and $K \subset E$ convex and compact. Then $$ K = \mathrm{conv}(\mathrm{Extr}(K)). $$ where the notation $\mathrm{conv}(G)$ stands for the convex enveloppe of the set $G$.
Proof.
Proof.
It suffices to show $K \subset \mathrm{conv}(\mathrm{Extr}(K))$, the reverse inclusion being clear by convexity of $K$.
The proof proceeds by induction on $n$, the dimension of $E$. We assume the property is true for every Euclidean subspace of dimension $n-1$. Without loss of generality, we assume that $K$ is not reduced to a singleton.
Let $x \in K$. Choose $y \in K \setminus \{x\}$ arbitrarily. Consider $[a,b]$ a maximal segment contained in $K$ and containing $[x,y]$. Since $a,b \in K$ and $x$ is a convex combination of $a,b$, it suffices to show that $a,b \in \mathrm{conv}(\mathrm{Extr}(K))$. We prove that $a \in \mathrm{conv}(\mathrm{Extr}(K))$ (the case $b \in \mathrm{conv}(\mathrm{Extr}(K))$ is similar). Since $a \in \partial K$, there exists a sequence $a_k \notin K$ converging to $a$ and for each $k$, we choose $u_k$ unitary such that for all $y \in K$, $\langle y-a_k,u_k \rangle \geq 0$. We can extract from $(u_k)$ a convergent subsequence on the unit sphere, and let $u$ be its limit. Passing to the limit yields $\langle y-a,u \rangle \geq 0$. Thus, defining $f(y)=\langle y-a,u \rangle$, we obtain that $a \in H_a=f^{-1}(0) \cap K=\mathrm{argmin}_{y\in K} f(y)$. The set $H_a$ is convex, compact, non-empty in a space of dimension $n-1$, and the induction hypothesis shows that $a \in \mathrm{conv}(\mathrm{Extr}(H_a))$. It remains to show that $\mathrm{Extr}(H_a) \subset \mathrm{Extr}(K)$. Indeed, since $f$ is linear, if $z \in \mathrm{Extr}(H_a)$ and $z=\lambda x + (1-\lambda) y$ with $x,y \in K$, then $f(z)=0=\lambda f(x) + (1-\lambda) f(y)$, hence $f(x)=f(y)=0$ (since $f\geq 0$ on $K$), which implies $x,y \in H_a$ and thus $\lambda \in \{0,1\}$ since $z \in \mathrm{Extr}(H_a)$. This shows that $z \in \mathrm{Extr}(K)$ and the proof is concluded.
Corollary. Let $E$ be an Euclidean space and $K \subset E$ convex and compact. Let $f : E \to \mathbb{R}$ be linear. Then $f$ attains its minimum on $K$ at a point of $\mathrm{Extr}(K)$.
Proof.
Proof.
Let $x \in \mathrm{argmin}_{y\in K} f(y)$. By Krein–Milman, there exist $y_1,\ldots,y_n \in \mathrm{Extr}(K)$ such that $x$ is a convex combination of the $(y_i)$, and since $f$ is linear and $f(x)=\min_{y\in K} f(y)$, we obtain $f(x)=f(y_1)=\cdots=f(y_n)$, which completes the proof.
Theorem (Birkhoff–Von Neumann). Let $B_n$ be the set of bistochastic matrices (of size $n\times n$), i.e. defined by $$ B_n = \left\{ A = (a_{ij}) \in \mathcal{M}_n([0,1]) \mid \sum_{j=1}^n a_{ij} = 1,\ \sum_{i=1}^n a_{ij} = 1 \right\}. $$ Then $\mathrm{Extr}(B_n) = P_n$, the set of permutation matrices (i.e. matrices having exactly one $1$ in each row and each column and zeros everywhere else).
Proof.
Proof.
A permutation matrix is clearly extreme. We prove by induction on the dimension that any extreme matrix is a permutation matrix. Assume it is true for $n-1$. Now take $A \in \mathrm{Extr}(B_n)$.
We first show by contradiction that it has at most $2n-1$ nonzero entries. Indeed, otherwise there exist distinct pairs $(i_k,j_k)$ such that $a_{i_k j_k} > 0$ for all $k \in [1:2n]$. Define $F_1=\mathrm{Vect}(E_{i_k,j_k}, k \in [1:2n])$, where $E_{i,j}$ is the matrix whose only nonzero entry is at $(i,j)$ and equal to $1$. Let $F_2$ be the vector space of matrices whose row and column sums are zero (a basis is given by $(E_{ij} - E_{in} - E_{nj})$ with $i,j \in [1:n-1]$). Then $\dim F_1=2n$ and $\dim F_2=(n-1)^2$, hence $F_1$ and $F_2$ cannot be in direct sum since the sum of their dimensions is $n^2+1$, which exceeds the dimension of the space of $n \times n$ matrices. Thus there exists a nonzero matrix $N \in F_1 \cap F_2$. For $\epsilon$ small enough, $A \pm \epsilon N$ is bistochastic, and we would have $$ A=\frac{1}{2}(A+\epsilon N)+\frac{1}{2}(A-\epsilon N), $$ which contradicts that $A$ is extreme.
Thus $A$ has at most $2n-1$ nonzero entries. Therefore one of its $n$ rows contains only one nonzero entry, which must be equal to $1$. The other entries in the corresponding column must be $0$. Removing this row and column, we obtain a matrix of size $n-1$ which is still bistochastic and still extreme. By the induction hypothesis, it is a permutation matrix. Therefore $A$ is a permutation matrix, which concludes the induction.
Let \((X_i)_{i\in\mathbb{N}}\) be a family of independent and identically distributed (i.i.d.) random variables defined on a common probability space and taking values in a measurable space \((\mathsf{X}, \mathcal{X})\).
Equivalent Formulation: We can also consider \(X_i\) as the coordinate projection associated with the probability space \((\mathsf{X}^\mathbb{N}, \mathcal{X}^{\otimes\mathbb{N}}, \mathbb{P})\), where, under \(\mathbb{P}\), the sequence \((X_i)\) is i.i.d.
Permutation-Invariant σ-Fields: For any \(n \in \mathbb{N}\), let \(\mathcal{G}_n\) be the σ-field generated by measurable functions \(f: \mathsf{X}^\mathbb{N} \to \mathbb{R}\) that are invariant under any permutation of the first \(n\) coordinates. Define the σ-field $\mathcal{G}_\infty$ as: \[ \mathcal{G}_\infty = \bigcap_{n\in\mathbb{N}} \mathcal{G}_n, \] which is generated by measurable functions invariant under any permutation of a finite number of coordinates.
Statement: For any \(A \in \mathcal{G}_\infty\), \(\mathbb{P}(A) = 0\) or \(1\).
This result is known as the Hewitt-Savage 0-1 Law.
Objective: Show that \(\mathbb{P}(A) = \mathbb{P}(A)^2\), which implies \(\mathbb{P}(A) \in \{0,1\}\).
Key Idea: We aim to establish the identity \(\mathbb{P}(A) = \mathbb{E}[\mathsf{1}_A \mathsf{1}_A] = \mathbb{E}[\mathsf{1}_A]^2 = \mathbb{P}(A)^2\). To do so, the idea is to approximate $\mathsf{1}_A$ by the indicators of two different independent events.
Step 1: Approximation Lemma Let \(\delta > 0\). By the Approximation Lemma, there exist \(n \in \mathbb{N}\) and a set \(B \in \mathcal{F}_n = \sigma(X_{1:n})\) such that: \[ \mathbb{E}\left[|\mathsf{1}_A - \mathsf{1}_B|\right] \leq \delta. \] Since \(B \in \mathcal{F}_n\), there exists \(\bar{B} \in \mathcal{X}^{\otimes n}\) such that: \[ \mathsf{1}_B = \mathsf{1}_{\bar{B}}(X_{1:n}). \] Thus, we have: \[ \mathbb{E}\left[|\mathsf{1}_A - \mathsf{1}_{\bar{B}}(X_{1:n})|\right] \leq \delta. \]
Step 2: Permutation Argument Define a permutation \(\pi\) on \(\{1, \ldots, 2n\}\) that swaps the first \(n\) coordinates with the next \(n\) coordinates: \[ \pi(1:2n) = (n+1:2n, 1:n). \] Since \(A \in \mathcal{G}_\infty\), the event \(A\) is invariant under \(\pi\). Therefore: \[ \delta \geq \mathbb{E}\left[|\mathsf{1}_A - \mathsf{1}_B|\right] = \mathbb{E}\left[|\mathsf{1}_A(X_{\pi(1:2n), 2n:\infty}) - \mathsf{1}_B(X_{\pi(1:2n), 2n:\infty})|\right] = \mathbb{E}\left[|\mathsf{1}_A - \mathsf{1}_{\bar{B}}(X_{n+1:2n})|\right]. \]
Step 3: Independence and Expectation Introduce the intermediate quantities \(\mathsf{1}_{\bar{B}}(X_{1:n})\) and \(\mathsf{1}_{\bar{B}}(X_{n+1:2n})\) which are independent by independence of \((X_i)\). Then, we obtain: $$ |\mathbb{E}[\mathsf{1}_A \mathsf{1}_A] - \mathbb{E}[\mathsf{1}_A]\mathbb{E}[\mathsf{1}_A]| \leq |\mathbb{E}[\mathsf{1}_A \mathsf{1}_A] - \mathbb{E}[\mathsf{1}_{\bar{B}}(X_{1:n})\mathsf{1}_{\bar{B}}(X_{n+1:2n})]| + | \mathbb{E}[\mathsf{1}_{\bar{B}}(X_{1:n})]\mathbb{E}[\mathsf{1}_{\bar{B}}(X_{n+1:2n})]- \mathbb{E}[\mathsf{1}_A] \mathbb{E} [\mathsf{1}_A]| \leq 4\delta. $$ Since \(\delta > 0\) is arbitrary, we conclude that: \[ \mathbb{E}[\mathsf{1}_A \mathsf{1}_A] = \mathbb{E}[\mathsf{1}_A]^2, \] which implies \(\mathbb{P}(A) = \mathbb{P}(A)^2\). Therefore, \(\mathbb{P}(A) \in \{0,1\}\).
Conclusion: The Hewitt-Savage 0-1 Law states that any permutation-invariant event in the tail σ-field \(\mathcal{G}_\infty\) has probability 0 or 1.
$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lp}[1]{\mathsf{L}_{{#1}}} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcD}{\mathcal{D}} \newcommand{\mcf}{\mathcal{F}} \newcommand{\mcl}{\mathcal{L}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$
Let $(\Xset^{\mathbb{N}},\Xsigma^{\otimes\mathbb{N}},\mathbb{P})$ be a probability space. For each $k\in\mathbb{N}$, define $$ \mathcal{F}_k=\sigma(X_{1:k}), $$ where $X_i$ denotes the coordinate projection on the $i$-th component, that is, $X_i(\omega)=\omega_i$ for $\omega\in\Xset^{\mathbb{N}}$.
Lemma (Approximation Lemma). Any set $A\in\mathcal{X}^{\otimes\mathbb{N}}$ satisfies the approximation property:
For every $\delta>0$, there exist an integer $k\in\mathbb{N}$ and a set $B\in\mathcal{F}_k$ such that $$ \mathbb{E}\big[|\mathbb{1}_A-\mathbb{1}_B|\big]\leq\delta. $$
The proof is a classical application of the monotone class theorem.
Let $\mathcal{M}$ denote the collection of all sets $A\in\mathcal{X}^{\otimes\mathbb{N}}$ that satisfy the approximation property stated in the lemma. We show that $\mathcal{M}$ is a monotone class.
Let $A_0,A_1\in\mathcal{M}$ with $A_0\subset A_1$. We claim that $A_1\setminus A_0\in\mathcal{M}$.
For arbitrary sets $A_0,A_1,B_0,B_1$, the following identity holds: $$ \mathbb{1}_{A_1\setminus A_0}-\mathbb{1}_{B_1\setminus B_0} =\mathbb{1}_{A_1}\mathbb{1}_{A_0^c}-\mathbb{1}_{B_1}\mathbb{1}_{B_0^c} =\mathbb{1}_{A_1}(\mathbb{1}_{A_0^c}-\mathbb{1}_{B_0^c}) +(\mathbb{1}_{A_1}-\mathbb{1}_{B_1})\mathbb{1}_{B_0^c}. $$
Taking absolute values and expectations yields $$ \mathbb{E}\big[|\mathbb{1}_{A_1\setminus A_0}-\mathbb{1}_{B_1\setminus B_0}|\big] \leq \mathbb{E}\big[|\mathbb{1}_{A_0}-\mathbb{1}_{B_0}|\big] + \mathbb{E}\big[|\mathbb{1}_{A_1}-\mathbb{1}_{B_1}|\big]. $$
Since $A_0,A_1\in\mathcal{M}$, both terms on the right-hand side can be made arbitrarily small. Hence, the approximation property holds for $A_1\setminus A_0$.
Let $(A_n)_{n\geq0}$ be an increasing sequence of sets in $\mathcal{M}$, and define $$ A=\bigcup_{n\geq0}A_n. $$
For any set $B$, the triangle inequality gives $$ \mathbb{E}\big[|\mathbb{1}_A-\mathbb{1}_B|\big] \leq \mathbb{E}\big[|\mathbb{1}_A-\mathbb{1}_{A_n}|\big] + \mathbb{E}\big[|\mathbb{1}_{A_n}-\mathbb{1}_B|\big]. $$
Since $\mathbb{1}_{A_n}\uparrow\mathbb{1}_A$ pointwise, the first term converges to zero, while the second term can be made arbitrarily small because $A_n\in\mathcal{M}$. Therefore, $A\in\mathcal{M}$.
Finally, $\mathcal{M}$ is a monotone class containing all $\mathcal{F}_k$. By the monotone class theorem, it contains $$ \sigma\Big(\bigcup_{k\geq0}\mathcal{F}_k\Big)=\mathcal{X}^{\otimes\mathbb{N}}. $$ This completes the proof.
Let $a<b$ and let $(X_k)_{k\ge0}$ be a real-valued process. We define an indicator process $(C_k)$ that tracks the phases during which the process is performing an upcrossing from $a$ to $b$.
Set $$ C_0 := \mathbf 1_{\{X_0<a\}}. $$ For all $k\ge1$, define recursively $$ C_k = \mathbf 1_{\{C_{k-1}=1\}}\mathbf 1_{\{X_{k-1}\le b\}} + \mathbf 1_{\{C_{k-1}=0\}}\mathbf 1_{\{X_{k-1}<a\}}. $$
Define also $$ Y_k := \sum_{\ell=1}^k C_\ell (X_\ell - X_{\ell-1}). $$
Interpretation.
Let $U_{0:n}[a,b]$ denote the number of completed upcrossings from $a$ to $b$ between times $0$ and $n$. Then the following inequality holds: $$ Y_n \ge (b-a)\,U_{0:n}[a,b] - (X_n-a)^-. $$
Each completed upcrossing from $a$ to $b$ contributes at least $b-a$ to the sum $Y_n$, since only the increments occurring during the corresponding phase are accumulated.
The only possible negative contribution comes from an unfinished upcrossing at time $n$, when $X_n<a$. This loss is exactly controlled by the term $(X_n-a)^-$.
Summing the contributions of all completed upcrossings and accounting for the final correction yields the stated inequality. ∎
Let $(X_n)_{n\ge0}$ be a supermartingale such that $$ \sup_n \mathbb E[|X_n|] < \infty. $$ Then the limit $$ \lim_{n\to\infty} X_n $$ exists almost surely.
By the upcrossing inequality, for any $a<b$, $$ (b-a)\,\mathbb E[U_{0:n}[a,b]] \le \mathbb E[Y_n] + \mathbb E[(X_n-a)^-]. $$
Since $(X_n)$ is a supermartingale, $(Y_n)$ is also a supermartingale with non-positive expectation, hence $\mathbb E[Y_n]\le0$. The $L^1$ boundedness assumption implies that $\sup_n \mathbb E[(X_n-a)^-]<\infty$.
By the monotone convergence theorem, $$ \mathbb E[U_{0:\infty}[a,b]]<\infty, $$ which implies that the total number of upcrossings is almost surely finite. Hence, the process cannot oscillate infinitely many times between $a$ and $b$, and $X_n$ converges almost surely. ∎
Intuition. Conditioning with respect to smaller and smaller $\sigma$-fields means losing information. The conditional expectations stabilize and converge to the conditional expectation with respect to the remaining common information.
Let $(\mathcal F_{-n})_{n\ge1}$ be a decreasing sequence of $\sigma$-fields such that $$ \mathcal F_{-\infty} = \bigcap_{k\ge1}\mathcal F_{-k} \subseteq \cdots \subseteq \mathcal F_{-(n+1)} \subseteq \mathcal F_{-n} \subseteq \cdots \subseteq \mathcal F_{-1}. $$
For any random variable $Z$ satisfying $\mathbb E[|Z|]<\infty$, we have $$ \lim_{n\to\infty}\mathbb E[Z\mid\mathcal F_{-n}] = \mathbb E[Z\mid\mathcal F_{-\infty}] \quad\text{almost surely}. $$
Define the process $$ X_k := \mathbb E[Z\mid\mathcal F_k], \qquad -n\le k\le -1. $$ This is a martingale bounded in $L^1$.
Similarly as before, we define $C_1=\mathsf{1}_{X_{-n} < a}$ . For all $-n < k \leq -1$, we set $C_k=\mathsf{1}_{C_{k-1}=1} \mathsf{1}_{X_{k-1} \leq b}+ \mathsf{1}_{C_{k-1}=0} \mathsf{1}_{X_{k-1} <a}$. Define also $Y_k=\sum_{\ell=-n}^k C_\ell (X_\ell - X_{\ell-1})$. Then, for the same reasons, the number of upcrossing $U_{-n:-1}[a,b]$ between $-n$ and $-1$ can be bounded as follows: $$ Y_{-1} \geq (b-a) U_{-n:-1}[a,b] - (X_{-1}-a)^- $$
As before, the integrability of $Z$ ensures that the expected number of upcrossings is finite, which implies that $(X_k)$ converges almost surely as $k\to-\infty$.
The limit must coincide with $\mathbb E[Z\mid\mathcal F_{-\infty}]$, which completes the proof. ∎
$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lp}[1]{\mathsf{L}_{{#1}}} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcD}{\mathcal{D}} \newcommand{\mcf}{\mathcal{F}} \newcommand{\mcl}{\mathcal{L}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$
De Finetti's Theorem: Let $(X_i)_{i\in\mathbb{N}}$ be a family of exchangeable random elements taking values on a measurable space $(\mathsf{X},\mathcal{X})$. Then, there exists a $\sigma$-field $\mathcal{G}_\infty$ such that, conditionally on $\mathcal{G}_\infty$, the random variables $(X_i)_{i\in\mathbb{N}}$ are independent and identically distributed (i.i.d.).
The proof is based on the paper “Uses of exchangeability” by J. F. Kingman Click here to see the paper.
Without loss of generality, we model $(X_i)_{i\in\mathbb{N}}$ as the coordinate projections on the canonical probability space $(\mathsf{X}^{\mathbb{N}},\mathcal{X}^{\otimes\mathbb{N}},\mathbb{P})$. We proceed as follows:
1. Reverse filtration construction:
$$ f(x_1,\ldots,x_n,x_{n+1},\ldots)=f(x_{\pi(1)},\ldots,x_{\pi(n)},x_{n+1},\ldots). $$
$$ \mathcal{G}_\infty=\bigcap_{n\in\mathbb{N}}\mathcal{G}_n. $$
2. Conditional expectation and empirical averages:
$$ \mathbb{E}\!\left[\left(\frac1n\sum_{i=1}^n h(X_i)\right)\mathbf1_A\right] =\frac1n\sum_{i=1}^n \mathbb{E}\!\left[\left( h(X_i)\right)\mathbf1_A\right] =\mathbb{E}[h(X_1)\mathbf1_A]. $$
$$ \frac1n\sum_{i=1}^n h(X_i)=\mathbb{E}[h(X_1)\mid\mathcal{G}_n], \quad a.s. $$
$$ \frac1n\sum_{i=1}^n h(X_i)\xrightarrow{\mathrm{a.s.}}\mathbb{E}[h(X_1)\mid\mathcal{G}_\infty]. $$
3. Multivariate functions:
$$ \mathbb{E}[f(X_1,\ldots,X_k)\mid\mathcal{G}_\infty] =\lim_{n\to\infty}\frac1{n(n-1)\cdots(n-k+1)} \sum_{\substack{i_{1:k} \in [1:n]^k\\ i_j\neq i_\ell}} f(X_{i_1},\ldots,X_{i_k}). $$
$$ \frac1{n(n-1)\cdots(n-k+1)} \sum_{\substack{i_{1:k} \in [1:n]^k\\ i_j\neq i_\ell}} f(X_{i_1},\ldots,X_{i_k}) + O\lr{\frac1n} = \frac1{n^k}\sum_{i_1=1}^n\cdots\sum_{i_k=1}^n f(X_{i_1},\ldots,X_{i_k}). $$
$$ \mathbb{E}[f_1(X_1)\cdots f_k(X_k)\mid\mathcal{G}_\infty] =\prod_{\ell=1}^k\mathbb{E}[f_\ell(X_1)\mid\mathcal{G}_\infty]. $$
If $(X_i)$ are real-valued random variables, then, since the distribution of any random variable $X$ is completely determined by the values $\mathbb{P}(X \le x)$ for $x \in \mathbb{Q}$, we can replace $\mathcal{G}_\infty$ with the $\sigma$-field generated by the countable family of random variables \[ \left\{ \mathbb{E}\big[\mathbf{1}_{\{X_1 \le x\}} \mid \mathcal{G}_\infty \big] : x \in \mathbb{Q} \right\}. \] It follows that there exists a random variable $S$ such that, conditional on $S$, the sequence $(X_i)$ is independent and identically distributed.
The previous approach allows to prove the strong law of large numbers (for a proof of the LLN using only the dominated convergence theorem, click here)
Let \((X_i)\) be a sequence of independent and identically distributed (i.i.d.) random variables such that \(\mathbb{E}[|X_1|] < \infty\). Then, \[ \frac{1}{n} \sum_{i=1}^n X_i \xrightarrow{a.s.} \mathbb{E}[X_1]. \]
Proof: By the reverse martingale convergence theorem (see Upcrossing Inequality and Martingale Convergence), the empirical average converges almost surely to the conditional expectation with respect to the tail σ-field \(\mathcal{G}_\infty\): \[ \frac{1}{n} \sum_{i=1}^n X_i = \mathbb{E}[X_1|\mathcal{G}_n] \xrightarrow{a.s.} \mathbb{E}[X_1|\mathcal{G}_\infty]. \]
The Hewitt-Savage 0-1 Law states that any \(\mathcal{G}_\infty\)-measurable random variable is almost surely constant. Consequently, the conditional expectation \(\mathbb{E}[X_1|\mathcal{G}_\infty]\) is almost surely equal to its unconditional expectation: \[ \mathbb{E}[X_1|\mathcal{G}_\infty] = \mathbb{E}[X_1] \quad \text{a.s.} \]
Thus, the empirical average converges almost surely to the theoretical expectation: \[ \frac{1}{n} \sum_{i=1}^n X_i \xrightarrow{a.s.} \mathbb{E}[X_1]. \]