This is an old revision of the document!

$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lp}[1]{\mathsf{L}_{{#1}}} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcD}{\mathcal{D}} \newcommand{\mcf}{\mathcal{F}} \newcommand{\mcl}{\mathcal{L}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$

2025/04/30 10:21

kkt conditions

Weak duality

Let $f$ a convex function to be minimized on a restricted set $\mcD$ that will be now defined. Let $(h_i)_{1 \leq i \leq n}$ be convex differentiable functions on $\Xset = \mathbb{R}^p$, representing inequality constraints. Let $(g_j)_{1 \leq j \leq m}$ be affine equality constraints: $$ g_j(x) = a_j^T x - b_j, \quad j=1,\dots,m. $$ Define the feasible set $$ \mcD = \bigcap_{i=1}^n \{h_i \leq 0\} \cap \bigcap_{j=1}^m \{g_j = 0\} \neq \emptyset. $$

Since each $h_i$ is convex and each $g_j$ is affine, the set $\mcD$ is convex. We aim at minimizing the convex function $f$ over the convex set $\mcD$.

For $(x,\lambda,\mu)\in \Xset\times (\mathbb{R}^+)^n \times \mathbb{R}^m$, we define the Lagrangian function $$ \mcl(x,\lambda,\mu) = f(x) + \sum_{i=1}^n \lambda_i h_i(x) + \sum_{j=1}^m \mu_j g_j(x), $$

For all $(x,\lambda,\mu) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$, $$ \inf_{x \in \Xset} \mcl(x,\lambda,\mu) \leq \mcl(x,\lambda,\mu). $$

Taking the supremum over $\lambda \ge 0, \mu \in \mathbb{R}^m$ (where the notation $\lambda \geq 0$ means that all the components of $\lambda$ are non-negative) yields $$ \sup_{\lambda \geq 0,\ \mu} \inf_{x \in \Xset} \mcl(x,\lambda,\mu) \leq \sup_{\lambda \geq 0,\ \mu} \mcl(x,\lambda,\mu) = \infty \mathbf{1}_{x \notin \mcD} + f(x) \mathbf{1}_{x \in \mcD}. $$

Taking the infimum over $x \in \Xset$ leads to the weak duality relation : \begin{equation} \label{eq:weak} \sup_{\lambda \geq 0,\ \mu} \inf_{x \in \Xset} \mcl(x,\lambda,\mu) \leq \inf_{x \in \Xset} \sup_{\lambda \geq 0,\ \mu} \mcl(x,\lambda,\mu) = \inf_{x \in \mcD} f(x). \end{equation}

Observe that in the lhs (left-hand side), the infimum is taken over $x \in \Xset$, hence no constraint is imposed. In contrast, the rhs (right-hand side) corresponds to the constrained problem.

The rhs is called the primal problem, while the lhs is referred to as the dual problem. Since $x \mapsto \mcl(x,\lambda,\mu)$ is convex, the dual problem $\sup_{\lambda\geq 0,\mu}\inf_{x \in \Xset} \mcl(x,\lambda,\mu)$ is equivalent to $$ \sup \{\mcl(x,\lambda,\mu)\,:\lambda \geq 0, \mu \mbox{ and }\nabla_x \mcl(x,\lambda)=0\} $$

This formulation is often useful when solving the optimization problem.

To obtain equality in \eqref{eq:weak} (known as the strong duality relation), additional assumptions are required, such as the existence of a Slater point. Before discussing this, we introduce a useful intermediate result.

Karush-Kuhn-Tucker conditions

Sufficient conditions

Lemma

Assume there exist $(x^*, \lambda^*, \mu^*) \in \mcD \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ such that

$\nabla_x \mcl(x^*, \lambda^*, \mu^*) = 0$,
for all $i \in [1:n]$, either $\lambda_i^* = 0$ or $h_i(x^*) = 0$.

Then $$ f(x^*) = \inf_{x \in \mcD} f(x) = \mcl(x^*, \lambda^*, \mu^*), $$ and strong duality holds.

Proof

Let $x \in \mcD$. By convexity of $f$, we have $$ f(x) - f(x^*) \geq \nabla f(x^*)^T (x - x^*). $$

Using the definition of the Lagrangian and the stationarity condition, we obtain $$ \nabla f(x^*) + \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*) + \sum_{j=1}^m \mu_j^* \nabla g_j(x^*) = 0, $$ which implies \begin{equation} \label{eq:grad} f(x) - f(x^*) \geq \nabla f(x^*)^T (x - x^*) = - \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*)^T (x - x^*) - \sum_{j=1}^m \mu_j^* \nabla g_j(x^*)^T (x - x^*). \end{equation}

Inequality constraints: $h_i$: by convexity, for any $x \in \mcD$, $$ - \nabla h_i(x^*)^T (x - x^*) \ge h_i(x^*) - h_i(x) \ge h_i(x^*), $$ hence $$ - \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*)^T (x - x^*) \ge \sum_{i=1}^n \lambda_i^* h_i(x^*) = 0. $$

Equality constraints: $g_j$: since $g_j$ is affine and $x \in \mcD$, we have $g_j(x) = g_j(x^*) = 0$, hence $$ \nabla g_j(x^*)^T (x - x^*) = g_j(x) - g_j(x^*)= 0. $$

Combining these relations in \eqref{eq:grad}, we obtain that for any $x \in \mcD$, $$ f(x) - f(x^*) \ge \nabla f(x^*)^T (x - x^*) \ge 0, $$ which proves that $f(x^*) = \inf_{x \in \mcD} f(x)= \mcl(x^*, \lambda^*, \mu^*)$. Since $x \mapsto \mcl(x,\lambda^*,\mu^*)$ is convex, the KKT conditions show that $$ \mcl(x^*, \lambda^*, \mu^*) = \inf_{x\in \Xset} \mcl(x, \lambda^*, \mu^*) \leq \sup_{\lambda \geq 0,\mu} \inf_{x\in \Xset} \mcl(x, \lambda, \mu) $$ which is the converse inequality in \eqref{eq:weak} and the strong duality holds.

Saddle points

Definition

We say that $(x^*, \lambda^*, \mu^*) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ is a saddle point of the Lagrange function $\mcl$ if for every $(x, \lambda, \mu) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$, $$ \mcl(x^*, \lambda, \mu) \leq \mcl(x, \lambda^*, \mu^*). $$

Proposition

If $(x^*, \lambda^*, \mu^*) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ is a saddle point for $\mcl$, then strong duality holds, and the KKT conditions are satisfied at $(x^*, \lambda^*, \mu^*)$.

Proof

By the saddle point property, $$ \sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) \leq \inf_{x \in \Xset} \mcl(x, \lambda^*, \mu^*). $$ Hence, $$ \inf_{x \in \Xset} \sup_{\lambda \geq 0, \mu} \mcl(x, \lambda, \mu) \leq \sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) \leq \inf_{x \in \Xset} \mcl(x, \lambda^*, \mu^*) \leq \sup_{\lambda \geq 0, \mu} \inf_{x \in \Xset} \mcl(x, \lambda, \mu). $$

This chain of inequalities implies strong duality (see \eqref{eq:weak} for the reverse inequality).

Moreover, the upper bound in the saddle point property, which holds for all $\lambda \geq 0, \mu$, implies that $\sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) <\infty$ and hence $h_i(x^*) \leq 0$ and $g_j(x^*) = 0$.

Finally, choosing $\lambda = 0$ and $\mu = 0$, we obtain $$ f(x^*) = \mcl(x^*, 0, 0) \le \mcl(x, \lambda^*, \mu^*) \le f(x), \quad \forall x \in \mcD, $$ which shows that $x^*$ minimizes $f$ over $\mcD$ and that $\lambda_i^* h_i(x^*) = 0$ for all $i$ where the last identity follows from the above inequality with $x=x^*$.

Convex Farkas lemma

We now assume the existence of a Slater point, that is, there exists $\tilde{x} \in \mcD$ such that for all $i \in \{1, \ldots, n\}$, $h_i(\tilde{x}) < 0$.

Theorem (Convex Farkas Lemma)

$$ \{x \in \mcD : f(x) < 0\} = \emptyset \iff \exists \lambda^* \ge 0,\ \mu^* \in \mathbb{R}^m \text{ s.t. } f(x) + \sum_{i=1}^n \lambda_i^* h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x) \ge 0 \ \forall x \in \Xset. $$

Proof

Recall that $$ g_j(x) = a_j^T x - b_j, \quad j=1,\dots,m. $$ Without loss of generality, we assume that $(a_j)_{1\leq j \leq m}$ are \textbf{linearly independent}. We define the set $U$ as $$ U = \{ u = (u_0, u_1, \dots, u_n, u_{n+1}, \dots, u_{n+m}) \in \rset^{n+m+1}: \exists x \in \Xset, f(x) < u_0, h_i(x) \le u_i, g_j(x) = u_{n+j} \}. $$

First note that the condition $\{x \in \mcD : f(x) < 0\} = \emptyset$ is equivalent to $0 \notin U$. Since $U$ is convex, a separation argument shows that $0 \notin U$ if and only if there exists a nonzero vector $\phi \in \mathbb{R}^{n+m+1}$ such that $$ \phi^T u \ge 0, \quad \forall u \in U. $$

Take an arbitrary $i \in [0:n]$. If $u\in U$ then $u+t e_i \in U$ where $t>0$ and $e_i=(\indiacc{k=i})_{k \in [0:n+m]} \in \rset^{n+m+1}$. The previous inequality implies $\phi^T (u+t e_i)=\phi^T u + t\phi_i \geq 0$ for all $t>0$, therefore $\phi_i\geq 0$ for any $i \in [0:n]$. Now, since $\phi^T u\geq 0$ for all $u \in U$, a simple limiting argument yields for all $x\in \Xset$, $$ \phi_0 f(x)+\sum_{i=1}^n \phi_i h_i(x) +\sum_{j=n+1}^{n+m} \phi_j g_j(x)\geq 0 $$ To conclude, and since we already know that $\phi_i\geq 0$ for all $i \in [0:n]$, it only remains to prove that $\phi_0 \neq 0$ and to set in that case $\lambda^*_i=\frac{\phi_i}{\phi_0}$ for $i\in \lrcb{1,\ldots,n}$ and $\mu_j^*=\frac{\phi_{n+j}}{\phi_0}$ for $j \in \lrcb{1,\ldots,m}$. The rest of the argument is by contradiction. If $\phi_0=0$, then the previous inequality on a Slater point $x=\tilde x \in \mcD $ yields $\sum_{i=1}^n \phi_i h_i(\tilde x) \geq 0$ but since $h_i(\tilde x)<0$ for all $i \in [1:n]$, we finally get $\phi_i=0$ for all $i \in [1:n]$. Then the previous inequality becomes for any $x\in \Xset$, $$ \sum_{j=n+1}^{n+m} \phi_j g_j(x)= \lr{\sum_{j=n+1}^{n+m} \phi_j a_j}^T x + \sum_{j=n+1}^{n+m} \phi_j b_j\geq 0 $$ which in turn implies that $\sum_{j=n+1}^{n+m} \phi_j a_j=0$ and since $(a_j)_{n+1\leq j \leq n+m}$ are linearly independent, we deduce that $\phi_j=0$ for any $j\in [n+1:n+m]$. Finally all the components of $\phi$ are null and we are face to a contradiction. The proof is completed.

Theorem (KKT conditions)

If $x^* \in \mcD$ minimizes $f$ and a Slater point exists, then there exist $\lambda^* \ge 0$ and $\mu^* \in \mathbb{R}^m$ such that: $(x^*,\lambda^*,\mu^*)$ is a saddle point and hence, KKT conditions hold:

$$ \nabla_x f(x^*) + \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*) + \sum_{j=1}^m \mu_j^* \nabla g_j(x^*) = 0 $$

$$ h_i(x^*) \le 0,\ \lambda_i^* \ge 0,\ \lambda_i^* h_i(x^*) = 0 \quad \forall i $$

$$ g_j(x^*) = 0 \quad \forall j $$

and the strong duality is satisfied.

Proof

Assume that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in \mcD$. Then $\{f-f(x^*)<0\} \cap \mcD = \emptyset$. By the Farkas lemma, there exist $\lambda^* \geq 0$ and $\mu^* \in \mathbb{R}^m$ such that for all $x\in \Xset$, $$ f(x)-f(x^*)+\sum_{i=1}^n \lambda^*_i h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x)\geq 0. $$

This implies that $(x^*,\lambda^*,\mu^*)$ is a saddle point. Indeed, for all $\lambda \geq 0$, $\mu \in \rset^m$ and $x \in \Xset$, $$ f(x^*)+\sum_{i=1}^n \lambda_i h_i(x^*) + \sum_{j=1}^m \mu_j g_j(x^*)\leq f(x^*) \leq f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x) $$

Therefore, $(x^*,\lambda^*,\mu^*)$ is a saddle point, which implies strong duality (as shown previously). This concludes the proof.

Application of the KKT conditions for a Wasserstein result

Let $X_1,\ldots, X_n$ be $n$ points in $\mathbb{R}$ ordered in non-decreasing order. Let $Y_1,\ldots, Y_n$ be $n$ other points in $\mathbb{R}$, also ordered in non-decreasing order.

Proposition

$$ \mathcal{W}_p \left(\frac{1}{n}\sum_{i=1}^n \delta_{X_i}, \frac{1}{n}\sum_{i=1}^n \delta_{Y_i}\right) = \left( \frac{1}{n} \sum_{i=1}^n |X_i - Y_i|^p \right)^{1/p}. $$

Proof

In order to prove the proposition, we will show that $$ \mathrm{argmin}\left\{ \sum_{i,j=1}^n p_{ij} |X_i- Y_j|^p \,: \forall i,j,\ p_{ij} \geq 0, \forall i, \sum_{j} p_{ij}=\frac{1}{N} , \forall j, \sum_{i} p_{ij}=\frac{1}{N} \right\}= \left[ \frac{1}{N} \mathsf{1}(i=j) \right]_{1\leq i,j \leq n}. $$

The function $p \mapsto \sum_{i,j} p_{ij}|X_i-Y_j|^p$ is linear, hence convex. Moreover, the constraints are affine: for all $i\in [1:n]$, $\sum_{j=1}^n p_{ij}=\frac{1}{N}$ and for all $j\in [1:n]$, $\sum_{i=1}^n p_{ij}=\frac{1}{N}$, together with inequality constraints $\forall i,j \in [1:n],\ -p_{ij} \leq 0$.

Therefore, we can apply the KKT theorem. We seek $p\in \rset^{n\times n}$, $\alpha,\beta \in \rset^n$, and $\gamma \in \rset^{n \times n}$ such that, defining $$ \mathcal{L}(p,\alpha,\beta,\gamma)=\sum_{i,j=1}^n p_{ij}|X_i-Y_j|^p - \gamma_{ij} p_{ij}+\sum_{i=1}^n \alpha_i \lr{\sum_{j=1}^n p_{ij}- \frac 1 N} +\sum_{j=1}^n \beta_j \lr{\sum_{i=1}^n p_{ij}- \frac 1 N}, $$ we have $\nabla_p \mathcal{L}(p,\alpha,\beta,\gamma)=0$, together with the conditions $\gamma_{ij} p_{ij}=0$ for all $i,j \in [1:n]$ and $\sum_{j=1}^np_{ij}=\sum_{i=1}^n p_{ij}=1/N$.

We already know that $p_{ij}=\frac 1 N \indi{i=j}$ is a solution. It therefore remains to find $\alpha,\beta,\gamma$ such that the KKT conditions are satisfied for this choice of $p$. These conditions can be written as:

for all $i,j \in [1:n]$, $\partial_{p_{ij}} \mathcal{L}(p,\alpha,\beta,\gamma)=|X_i-Y_j|^p - \gamma_{ij} +\alpha_i +\beta_j=0$,
for all $i,j \in [1:n]$, $\gamma_{ij} \geq 0$ and $\gamma_{ij} p_{ij}=0$.

The complementarity condition reduces to $\gamma_{ii}=0$, since $p_{ij}=\frac 1 N \indi{i=j}$. Hence, the KKT conditions are equivalent to: for all $i,j \in [1:n]$, $\gamma_{ij}=\alpha_i+\beta_j + |X_i-Y_j|^p \geq 0$ and $\gamma_{ii}=\alpha_i+\beta_i + |X_i-Y_i|^p=0$.

This is in turn equivalent to the existence of $\beta \in \rset^n$ such that

$\forall i,j \in [1:n], \ \beta_j - \beta_i \geq |X_i-Y_i|^p-|X_i-Y_j|^p$.

Set $\beta_1=0$ and, for all $i\in [1:n-1]$, define $\beta_j=\sum_{\ell=1}^{j-1} \lrcb{|X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p}$. Then, $$ \beta_j - \beta_i=\sum_{\ell=i}^{j-1} \lrcb{|X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p}. $$

Since $u \mapsto |u|^p$ is convex, we have $|a+c|^p-|a|^p \geq |b+c|^p-|b|^p$ for all $a \geq b$ and $c\geq 0$. We set $a=X_\ell -Y_{\ell+1}$, $b=X_i-Y_{\ell+1}$, and $c=Y_{\ell+1}-Y_\ell$. For $\ell \geq i$, we have $a \geq b$ and $c\geq 0$. Hence, for any $\ell \geq i$, the inequality can be rewritten as $$ |X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p \geq |X_i-Y_\ell|^p - |X_i - Y_{\ell+1}|^p. $$

Finally, plugging into the previous identity yields $$ \beta_j-\beta_i \geq \sum_{\ell=i}^{j-1} |X_i-Y_\ell|^p - |X_i - Y_{\ell+1}|^p = |X_i-Y_i|^p - |X_i - Y_j|^p. $$

This proves the KKT conditions, and the proof is complete.

The proposition shows that, for any $p \geq 1$, $$ \mathcal{W}_p(\mu,\nu) = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}, $$

where $$ \mu=\frac{1}{n} \sum_{i=1}^n \delta_{X_i}, \qquad \nu=\frac{1}{n} \sum_{i=1}^n \delta_{Y_i}. $$

We only assume that the sequences $(X_i)$ and $(Y_i)$ are ordered in non-decreasing order; in particular, repetitions are allowed.

We then extend this result to arbitrary discrete probability measures. Let $$ \mu=\sum_{i=1}^n \lambda_i \delta_{X_i}, \qquad \nu=\sum_{j=1}^m \gamma_j \delta_{Y_j}, $$

where the coefficients $(\lambda_i)$ and $(\gamma_j)$ are non-negative rational numbers summing to $1$. By the previous result, we still have $$ \mathcal{W}_p(\mu,\nu) = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}. $$

By a density argument, this identity extends to coefficients $(\lambda_i)$ and $(\gamma_j)$ with non-negative real values summing to $1$. Finally, by approximation, we obtain that for any probability measures $\mu$ and $\nu$ on $(\rset,\mathcal{B}(\rset))$, $$ \mathcal{W}_p(\mu,\nu) = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}. $$

Welcome to Randal Douc's wiki

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

kkt conditions

Weak duality

Karush-Kuhn-Tucker conditions

Sufficient conditions

Saddle points

Convex Farkas lemma

Theorem (Convex Farkas Lemma)

Theorem (KKT conditions)

Application of the KKT conditions for a Wasserstein result

Proposition

Welcome to Randal Douc's wiki

User Tools

Site Tools

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

kkt conditions

Weak duality

Karush-Kuhn-Tucker conditions

Sufficient conditions

Saddle points

Convex Farkas lemma

Theorem (Convex Farkas Lemma)

Theorem (KKT conditions)

Application of the KKT conditions for a Wasserstein result

Proposition

Page Tools