Table of Contents

$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcf}{\mathcal{F}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$

2017/10/07 23:39 · douc
, , ,

Gaussian concentration inequality for Lipschitz functions

This is an interesting application of a symmetrisation argument. This article is almost directly taken from Terrence Tao's webpage (Theorem 8) (see also Theorem 2.1.12 in “Topics in Random Matrix Theory” by Terrence Tao).

Theorem Let $(X_i)_{i\in [1:n]}$ be iid such that $X_i \sim N(0,\sigma^2)$ and set $X=(X_1,\ldots,X_n)$. Let $F: \rset^n \rightarrow \rset$ be a $1$-Lipschitz function wrt to the euclidian distance $\|\cdot\|$ on $\rset^n$, i.e., $|F(x)-F(y)| \leq \|x-y\|$ for all $x,y \in \rset^n$. Then for every $\lambda>0$, \begin{equation} \label{eq:fond} \PP( |F(X) - \PE[F(X)]| \geq \lambda ) \leq 2 \exp(-2\lambda^2/(\sigma \pi)^2 ) \end{equation}

Proof

The original argument was given by Maurey and Pisier. We only prove the assertion under the more restrictive assumption $\|\nabla F(x)\|\leq 1$. Let $Y$ an independent copy of $X$, then for every $t>0$, $$ \PP( F(X) - \PE[F(Y)]\geq \lambda ) \leq \rme^{-\lambda t} \PE[\rme^{t (F(X)- \PE[F(Y)])}] \leq \rme^{-\lambda t} \PE[\rme^{t (F(X)-F(Y))}] $$ where the last inequality follows from the Jensen inequality. We can see that the symmetrisation argument comes from a Jensen inequality and this now allows to use an infinitesimal writing of the difference $F(X) - F(Y)$, $$ F(X) - F(Y) = \int_0^{\pi/2} \frac{d}{d\theta} F(Y \cos \theta + X \sin \theta)\ \rmd\theta=\int_0^{\pi/2} \nabla F(U_\theta) \cdot V_\theta\ \rmd\theta, $$ where $U_\theta=Y \cos \theta + X \sin \theta$ and $V_\theta=-Y \sin \theta + X \cos \theta$ are standard Gaussian r.v. and independent!!!! This is an amazing trick, we don't use the classical interpolation $F(Y(1-t) + X t)$ but a “polar” interpolation and this allows to get the product of uncorrelated terms (and therefore independent since we are in the context of Gaussian vectors). We again make use of Jensen's inequality in order to let appear, with a tower property, a term of the form $\PE[\rme^W]$ where $W$ is Gaussian, $$ \PE[\rme^{t (F(X)-F(Y))}] \leq \frac{2}{\pi} \int_0^{\pi/2} \PE[ \rme^{\frac{\pi t}{2} \nabla F(U_\theta ) \cdot V_\theta }]\rmd\theta= \frac{2}{\pi} \int_0^{\pi/2} \PE[\rme^{\frac{(\sigma \pi t)^2}{8} \|\nabla F(U_\theta )\|^2}]\rmd\theta \leq \rme^{\frac{(\sigma \pi t)^2}{8}}. $$ The rest of the proof is now standard. This proves that for every $t>0$ $$ \PP( F(X) - \PE[F(Y)]\geq \lambda ) \leq \rme^{-\lambda t+\frac{(\sigma \pi t)^2}{8}} $$ Choosing now $t$ such that $\frac{\rmd }{\rmd t} (-\lambda t+\frac{(\sigma \pi t)^2}{8})=0$ yields $t=4\lambda/(\sigma \pi)^2$ and \begin{equation} \label{eq:two} \PP( F(X) - \PE[F(Y)]\geq \lambda ) \leq \rme^{-2\lambda^2/(\sigma \pi )^2} \end{equation} Applying the previous inequality with $F$ replaced by $-F$ and using $\PP( |A|\geq \lambda )=\PP( A\geq \lambda )+\PP( -A\geq \lambda )$ finishes the proof.

Consequences

We can see \eqref{eq:two} as $H\geq G$ where $H$ is the cumulative distribution function of $F(X)-\PE[F(X)]$ and $G(\lambda)=1-\rme^{-2\lambda^2/(\sigma \pi)^2}$. Then, for all $U \sim \unif[0,1]$, we have $$ H^{\leftarrow}(U) \leq G^{\leftarrow}(U) $$ where $H^{\leftarrow}$ is the generalized inverse of the function $H$. Therefore, there exists an exponential random variable $V$ such that $$ F(X)-\PE[F(X)] \leq \sigma \pi \sqrt{\frac{V}{2}} $$ We can improve this bound with some more intricate proof, see Thm B.6, Introduction to high dimensional statistics, C. Giraud, which shows that $$ \PP( |F(X) - \PE[F(X)]| \geq \lambda ) \leq 2 \exp(-\lambda^2/(2\sigma^2) ) $$