This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
world:kkt [2025/10/08 02:30] rdouc |
world:kkt [2026/04/09 23:17] (current) rdouc |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| {{page>:defs}} | {{page>:defs}} | ||
| - | {{tag>karush_kuhn_tucker kkt farkas_lemma}} | ||
| + | ====== kkt conditions ====== | ||
| + | |||
| + | ===== Weak duality ===== | ||
| + | |||
| + | Let $f$ a convex function to be minimized on a restricted set $\mcD$ that will be now defined. Let $(h_i)_{1 \leq i \leq n}$ be convex differentiable functions on $\Xset = \mathbb{R}^p$, representing //inequality constraints//. Let $(g_j)_{1 \leq j \leq m}$ be //affine equality constraints//: | ||
| $$ | $$ | ||
| - | \newcommand{\mcD}{\mathcal D} | + | g_j(x) = a_j^T x - b_j, \quad j=1,\dots,m. |
| - | \newcommand{\mcl}{\mathcal L} | + | $$ |
| + | Define the feasible set | ||
| + | $$ | ||
| + | \mcD = \bigcap_{i=1}^n \{h_i \leq 0\} \cap \bigcap_{j=1}^m \{g_j = 0\} \neq \emptyset. | ||
| $$ | $$ | ||
| - | ====== Weak duality ====== | + | Since each $h_i$ is convex and each $g_j$ is affine, the set $\mcD$ is convex. We aim at minimizing the convex function $f$ over the convex set $\mcD$. |
| - | Let $f$ and $(h_i)_{1 \leq i \leq n}$ be convex differentiable functions on $\Xset=\rset^p$ with $\lim_{|x| \to \infty} f(x)=\infty$. Let $\mcD=\cap_{i=1}^n \{h_i \leq 0\}\neq \emptyset$ and note that by convexity of the functions $h_i$, the set $\mcD$ is actually a convex set. We are interested in the infimum of the convex function $f$ on the convex set $\mcD$. Define the Lagrange function | + | |
| + | For $(x,\lambda,\mu)\in \Xset\times (\mathbb{R}^+)^n \times \mathbb{R}^m$, we define the Lagrangian function | ||
| $$ | $$ | ||
| - | \mcl(x,\lambda)=f(x)+\sum_{i=1}^n \lambda_i h_i(x) | + | \mcl(x,\lambda,\mu) = f(x) + \sum_{i=1}^n \lambda_i h_i(x) + \sum_{j=1}^m \mu_j g_j(x), |
| $$ | $$ | ||
| - | where $\lambda=(\lambda_1,\ldots,\lambda_n)^T \in \rset^n$. We first show the weak duality relation. Start with this simple remark: for all $(x,\lambda) \in \Xset \times \rset^n$, | + | |
| + | For all $(x,\lambda,\mu) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$, | ||
| $$ | $$ | ||
| - | \inf_{x \in \Xset} \mcl(x,\lambda) \leq \mcl(x,\lambda) | + | \inf_{x \in \Xset} \mcl(x,\lambda,\mu) \leq \mcl(x,\lambda,\mu). |
| $$ | $$ | ||
| - | Taking the supremum wrt $\lambda \geq 0$ on both sides yields: | + | |
| + | Taking the supremum over $\lambda \ge 0, \mu \in \mathbb{R}^m$ (where the notation $\lambda \geq 0$ means that all the components of $\lambda$ are non-negative) yields | ||
| $$ | $$ | ||
| - | \sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda) \leq \sup_{\lambda\geq 0} \mcl(x,\lambda)=\infty \indin{x\notin \mcD}+f(x)\indin{x\in \mcD} | + | \sup_{\lambda \geq 0,\ \mu} \inf_{x \in \Xset} \mcl(x,\lambda,\mu) \leq \sup_{\lambda \geq 0,\ \mu} \mcl(x,\lambda,\mu) |
| + | = \infty \mathbf{1}_{x \notin \mcD} + f(x) \mathbf{1}_{x \in \mcD}. | ||
| $$ | $$ | ||
| - | And taking now the infimum wrt $x \in \Xset$ on both sides, we finally get the <color red>**weak duality relation**</color>: | + | |
| + | Taking the infimum over $x \in \Xset$ leads to the <color red>weak duality relation </color>: | ||
| \begin{equation} \label{eq:weak} | \begin{equation} \label{eq:weak} | ||
| - | \sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda) \leq \inf_{x \in \Xset} \sup_{\lambda\geq 0}\mcl(x,\lambda)=\inf_{x \in \mcD} f(x) | + | \sup_{\lambda \geq 0,\ \mu} \inf_{x \in \Xset} \mcl(x,\lambda,\mu) |
| + | \leq \inf_{x \in \Xset} \sup_{\lambda \geq 0,\ \mu} \mcl(x,\lambda,\mu) | ||
| + | = \inf_{x \in \mcD} f(x). | ||
| \end{equation} | \end{equation} | ||
| - | Note that in the lhs (left-hand side), the infimum is wrt $x\in \Xset$ and therefore, there is no constraint, which is nice... | + | |
| - | The rhs (right-hand side) is called the //primal problem// and the lhs the //dual problem//. Note, since $x\mapsto \mcl(x,\lambda)$ is convex, that the <color red>dual problem</color> $\sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda)$ is equivalent to | + | Observe that in the lhs (left-hand side), the infimum is taken over $x \in \Xset$, hence no constraint is imposed. In contrast, the rhs (right-hand side) corresponds to the constrained problem. |
| + | |||
| + | The rhs is called the <color red>//primal problem//</color>, while the lhs is referred to as the <color red>//dual problem//</color>. Since $x \mapsto \mcl(x,\lambda,\mu)$ is convex, the dual problem $\sup_{\lambda\geq 0,\mu}\inf_{x \in \Xset} \mcl(x,\lambda,\mu)$ is equivalent to | ||
| $$ | $$ | ||
| - | \sup \set{\mcl(x,\lambda)}{\lambda \geq 0\mbox{ and }\nabla_x \mcl(x,\lambda)=0} | + | \sup \{\mcl(x,\lambda,\mu)\,:\lambda \geq 0, \mu \mbox{ and }\nabla_x \mcl(x,\lambda,\mu)=0\} |
| $$ | $$ | ||
| - | To obtain the equality in \eqref{eq:weak} (which is named the <color red>**strong duality relation**</color>), we actually need some additional assumptions (for example the existence of a Slater point). But before that, let us check a very useful relation... | + | This formulation is often useful when solving the optimization problem. |
| - | ====== Karush-Kuhn-Tucker conditions ====== | + | |
| + | To obtain equality in \eqref{eq:weak} (known as the <color red>**strong duality relation**</color>), additional assumptions are required, such as the existence of a Slater point. Before discussing this, we introduce a useful intermediate result. | ||
| + | |||
| + | ===== Karush-Kuhn-Tucker conditions ===== | ||
| ===== Sufficient conditions ===== | ===== Sufficient conditions ===== | ||
| - | <WRAP center round box 80%> | + | <WRAP center round tip 80%> |
| - | __**Lemma**__ | + | **Lemma** |
| - | Assume that there exist $(x^*,\lambda^*) \in \mcD\times (\rset^+)^n$ such that | + | |
| - | \begin{equation} | + | Assume there exist $(x^*, \lambda^*, \mu^*) \in \mcD \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ such that |
| - | \nabla_x \mcl(x^*,\lambda^*)=0 | + | * $\nabla_x \mcl(x^*, \lambda^*, \mu^*) = 0$, |
| - | \end{equation} | + | * for all $i \in [1:n]$, either $\lambda_i^* = 0$ or $h_i(x^*) = 0$. |
| - | and for all $i \in [1:n]$, we have either $\lambda^*_i=0$ or $h_i(x^*)=0$ . | + | |
| - | Then, the <color /yellow>strong duality holds</color> and | + | Then |
| $$ | $$ | ||
| - | f(x^*)=\inf_{x \in \mcD} f(x)=\mcl(x^*,\lambda^*) | + | f(x^*) = \inf_{x \in \mcD} f(x) = \mcl(x^*, \lambda^*, \mu^*), |
| $$ | $$ | ||
| + | and strong duality holds. | ||
| </WRAP> | </WRAP> | ||
| - | <hidden Click to see the proof> | + | |
| - | $\bproof$ | + | <hidden Proof> |
| - | For every $x \in \mcD$, the convexity of the function $f$ yields | + | Let $x \in \mcD$. By convexity of $f$, we have |
| $$ | $$ | ||
| - | f(x)-f(x^*) \geq \nabla f(x^*)^T (x-x^*). | + | f(x) - f(x^*) \geq \nabla f(x^*)^T (x - x^*). |
| $$ | $$ | ||
| - | It remains to prove $\nabla f(x^*)^T (x-x^*)\geq 0$. Using first $\nabla_x \mcl(x^*,\lambda^*)=0$ and then the convexity of the functions $h_i$, we get | + | |
| + | Using the definition of the Lagrangian and the stationarity condition, we obtain | ||
| $$ | $$ | ||
| - | \nabla f(x^*)^T (x-x^*)=-\sum_{i=1}^n \lambda^*_i \nabla_x h_i^T(x^*) (x-x^*) \geq -\sum_{i=1}^n \lambda^*_i (h_i(x)-h_i(x^*))=- \sum_{i=1}^n \lambda^*_i \underbrace{h_i(x)}_{\leq 0} \geq 0 | + | \nabla f(x^*) + \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*) + \sum_{j=1}^m \mu_j^* \nabla g_j(x^*) = 0, |
| - | $$ | + | $$ |
| - | This finishes the proof. $\eproof$ | + | which implies |
| + | \begin{equation} \label{eq:grad} | ||
| + | f(x) - f(x^*) \geq \nabla f(x^*)^T (x - x^*) = - \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*)^T (x - x^*) - \sum_{j=1}^m \mu_j^* \nabla g_j(x^*)^T (x - x^*). | ||
| + | \end{equation} | ||
| + | |||
| + | **Inequality constraints**: $h_i$: by convexity, for any $x \in \mcD$, | ||
| + | $$ | ||
| + | - \nabla h_i(x^*)^T (x - x^*) \ge h_i(x^*) - h_i(x) \ge h_i(x^*), | ||
| + | $$ | ||
| + | hence | ||
| + | $$ | ||
| + | - \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*)^T (x - x^*) \ge \sum_{i=1}^n \lambda_i^* h_i(x^*) = 0. | ||
| + | $$ | ||
| + | |||
| + | **Equality constraints**: $g_j$: since $g_j$ is affine and $x \in \mcD$, we have $g_j(x) = g_j(x^*) = 0$, hence | ||
| + | $$ | ||
| + | \nabla g_j(x^*)^T (x - x^*) = g_j(x) - g_j(x^*)= 0. | ||
| + | $$ | ||
| + | |||
| + | Combining these relations in \eqref{eq:grad}, we obtain that for any $x \in \mcD$, | ||
| + | $$ | ||
| + | f(x) - f(x^*) \ge \nabla f(x^*)^T (x - x^*) \ge 0, | ||
| + | $$ | ||
| + | which proves that $f(x^*) = \inf_{x \in \mcD} f(x)= \mcl(x^*, \lambda^*, \mu^*)$. Since $x \mapsto \mcl(x,\lambda^*,\mu^*)$ is convex, the KKT conditions show that | ||
| + | $$ | ||
| + | \mcl(x^*, \lambda^*, \mu^*) = \inf_{x\in \Xset} \mcl(x, \lambda^*, \mu^*) \leq \sup_{\lambda \geq 0,\mu} \inf_{x\in \Xset} \mcl(x, \lambda, \mu) | ||
| + | $$ | ||
| + | which is the converse inequality in \eqref{eq:weak} and the strong duality holds. | ||
| </hidden> | </hidden> | ||
| + | |||
| ===== Saddle points ===== | ===== Saddle points ===== | ||
| - | <WRAP center round box 80%> | + | <WRAP center round important 80%> |
| - | __**Definition**__ | + | **Definition** |
| - | We say that $(x^*,\lambda^*) \in \Xset \times (\rset^+)^n$ is a saddle point of the Lagrange function $\mcl$ if for every $(x,\lambda) \in \Xset \times (\rset^+)^n$, | + | |
| - | \begin{equation}\label{eq:saddle} | + | We say that $(x^*, \lambda^*, \mu^*) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ is a saddle point of the Lagrange function $\mcl$ if for every $(x, \lambda, \mu) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$, |
| - | \mcl(x^*,\lambda) \leq \mcl(x,\lambda^*) | + | $$ |
| - | \end{equation} | + | \mcl(x^*, \lambda, \mu) \leq \mcl(x, \lambda^*, \mu^*). |
| + | $$ | ||
| </WRAP> | </WRAP> | ||
| - | <WRAP center round box 80%> | + | |
| - | __**Saddle point Lemma**__ | + | <WRAP center round tip 80%> |
| - | If $(x^*,\lambda^*) \in \Xset \times (\rset^+)^n$ is a saddle point for $\mcl$ then the strong duality holds, and the KKT conditions holds for $(x^*,\lambda^*)$. | + | |
| + | **Proposition** | ||
| + | |||
| + | If $(x^*, \lambda^*, \mu^*) \in \Xset \times (\mathbb{R}^+)^n \times \mathbb{R}^m$ is a saddle point for $\mcl$, then strong duality holds, and the KKT conditions are satisfied at $(x^*, \lambda^*, \mu^*)$. | ||
| </WRAP> | </WRAP> | ||
| - | <hidden click here to see the proof> | + | |
| - | $\bproof$ | + | <hidden Proof> |
| - | Using \eqref{eq:saddle} with $\lambda=\lambda^*$ first and then with $x=x^*$, we have | + | By the saddle point property, |
| - | for every $(x,\lambda) \in \Xset \times (\rset^+)^n$, | + | |
| - | \begin{equation*} | + | |
| - | \mcl(x^*,\lambda) \leq \mcl(x^*,\lambda^*) \leq \mcl(x,\lambda^*) | + | |
| - | \end{equation*} | + | |
| - | The existence of a saddle point implies the **strong duality** since | + | |
| $$ | $$ | ||
| - | \inf_{x \in\Xset} \sup_{\lambda \geq 0} \mcl(x,\lambda) \leq \sup_{\lambda \geq 0} \mcl(x^*,\lambda) \leq \mcl(x^*,\lambda^*) \leq \inf_{x \in\Xset} \mcl(x,\lambda^*) \leq \sup_{\lambda \geq 0} \inf_{x \in\Xset} \mcl(x,\lambda) | + | \sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) \leq \inf_{x \in \Xset} \mcl(x, \lambda^*, \mu^*). |
| $$ | $$ | ||
| - | which is the converse inequality of \eqref{eq:weak}. This finally implies: | + | Hence, |
| - | \begin{equation}\label{eq:all} | + | $$ |
| - | \sup_{\lambda \geq 0} \inf_{x \in\Xset} \mcl(x,\lambda)=\inf_{x \in\Xset} \mcl(x,\lambda^*)=\mcl(x^*,\lambda^*)=\inf_{x \in\Xset} \sup_{\lambda \geq 0} \mcl(x,\lambda)=\inf_{x \in\mcD} f(x). | + | \inf_{x \in \Xset} \sup_{\lambda \geq 0, \mu} \mcl(x, \lambda, \mu) |
| - | \end{equation} | + | \leq \sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) |
| + | \leq \inf_{x \in \Xset} \mcl(x, \lambda^*, \mu^*) | ||
| + | \leq \sup_{\lambda \geq 0, \mu} \inf_{x \in \Xset} \mcl(x, \lambda, \mu). | ||
| + | $$ | ||
| + | |||
| + | This chain of inequalities implies strong duality (see \eqref{eq:weak} for the reverse inequality). | ||
| + | |||
| + | Moreover, the upper bound in the saddle point property, which holds for all $\lambda \geq 0, \mu$, implies that $\sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) <\infty$ and hence $h_i(x^*) \leq 0$ and $g_j(x^*) = 0$. | ||
| - | Note that $\mcl(x^*,\lambda^*)=\inf_{x \in\Xset} \mcl(x,\lambda^*)$ shows that $\nabla_x \mcl(x^*,\lambda^*)=0$. | + | Finally, choosing $\lambda = 0$ and $\mu = 0$, we obtain |
| - | + | ||
| - | The upper bound in \eqref{eq:saddle} regardless the value of $\lambda\geq 0$ shows that $h_i(x^*)\leq 0$ for every $i \in [1:n]$, that is $x^* \in \mcD$. Now taking \eqref{eq:saddle} with $\lambda=0$ yields for $x\in \mcD$, | + | |
| $$ | $$ | ||
| - | f(x^*)=\mcl(x^*,0) \leq \mcl(x,\lambda^*) \leq f(x) | + | f(x^*) = \mcl(x^*, 0, 0) \le \sup_{\lambda \geq 0, \mu} \mcl(x^*, \lambda, \mu) |
| - | $$ | + | = \inf_{x \in \Xset} \mcl(x, \lambda^*, \mu^*) \le \mcl(x, \lambda^*, \mu^*) \le f(x), \quad \forall x \in \mcD, |
| - | which shows that $f(x^*)=\inf_{x \in\mcD} f(x)$. Combining with \eqref{eq:all} yields $\mcl(x^*,\lambda^*)=f(x^*)$ so that $\sum_{i=1}^n \lambda^*_i h_i(x^*_i)=0$ and since $\lambda^* \geq 0$ and $x^* \in \mcD$, this implies $\lambda^*_i h_i(x^*)=0$ for all $i \in [1:n]$. | + | $$ |
| - | $\eproof$ | + | which shows that $x^*$ minimizes $f$ over $\mcD$ and that $\lambda_i^* h_i(x^*) = 0$ for all $i$ where the last identity follows from the above inequality with $x=x^*$. |
| </hidden> | </hidden> | ||
| - | ===== Necessary conditions and strong duality ===== | ||
| - | |||
| - | We now assume the existence of Slater points: there exists $\tilde x \in \mcD$ such that for all $i \in \{1,\ldots,n\}$, $h_i(\tilde x)<0$. | ||
| - | <WRAP center round box 80%> | + | ==== Convex Farkas lemma ==== |
| - | __**The convex Farkas lemma**__ | + | |
| - | Assume that there exists a Slater point. Then, $\{f<0\} \cap \mcD =\emptyset$ iff there exists $\lambda^* \geq 0$ such that for all $x\in \Xset$, $f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) \geq 0$. | + | We now assume the existence of a Slater point, that is, there exists $\tilde{x} \in \mcD$ such that for all $i \in \{1, \ldots, n\}$, $h_i(\tilde{x}) < 0$. |
| + | |||
| + | <WRAP center round alert 80%> | ||
| + | === Theorem (Convex Farkas Lemma) === | ||
| + | |||
| + | $$ | ||
| + | \{x \in \mcD : f(x) < 0\} = \emptyset \iff \exists \lambda^* \ge 0,\ \mu^* \in \mathbb{R}^m \text{ s.t. } f(x) + \sum_{i=1}^n \lambda_i^* h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x) \ge 0 \ \forall x \in \Xset. | ||
| + | $$ | ||
| </WRAP> | </WRAP> | ||
| + | <hidden Proof> | ||
| + | Recall that | ||
| + | $$ | ||
| + | g_j(x) = a_j^T x - b_j, \quad j=1,\dots,m. | ||
| + | $$ | ||
| + | Without loss of generality, we assume that $(a_j)_{1\leq j \leq m}$ are **linearly independent**. | ||
| + | We define the set $U$ as | ||
| + | $$ | ||
| + | U = \{ u = (u_0, u_1, \dots, u_n, u_{n+1}, \dots, u_{n+m}) \in \rset^{n+m+1}: \exists x \in \Xset, f(x) < u_0, h_i(x) \le u_i, g_j(x) = u_{n+j} \}. | ||
| + | $$ | ||
| - | <hidden Click here to see the proof> | + | First note that the condition $\{x \in \mcD : f(x) < 0\} = \emptyset$ is equivalent to $0 \notin U$. Since $U$ is convex, a separation argument shows that $0 \notin U$ if and only if there exists a nonzero vector $\phi \in \mathbb{R}^{n+m+1}$ such that |
| - | $\bproof$ | + | |
| - | Set | + | |
| $$ | $$ | ||
| - | U=\{u=u_{0:n} \in \rset^{n+1}; \exists\ x \in \Xset, f(x)<u_0 \ \mbox{and} \ h_i(x)\leq u_i \mbox{ for all } i \in [1:n]\}. | + | \phi^T u \ge 0, \quad \forall u \in U. |
| $$ | $$ | ||
| - | The condition $\{f<0\} \cap \mcD =\emptyset$ is equivalent to saying that $0 \notin U$ and since $U$ is clearly a convex set, by a separation argument, there exists a __**non-null**__ vector $\phi \in \rset^{n+1}$ such that | + | |
| - | \begin{equation} \label{eq:ineg} | + | |
| - | \forall u \in U\,, \quad \phi^T u\geq 0 | + | |
| - | \end{equation} | + | |
| Take an arbitrary $i \in [0:n]$. | Take an arbitrary $i \in [0:n]$. | ||
| - | If $u\in U$ then $u+t e_i \in U$ where $t>0$ and $e_i=(\indiacc{k=i})_{k \in [0:n]} \in \rset^{n+1}$. The previous inequality implies $\phi^T (u+t e_i)=\phi^T u + t\phi_i \geq 0$ for all $t>0$, therefore $\phi_i\geq 0$. Now, since $\phi^T u\geq 0$ for all $u \in U$, a simple limiting argument yields for all $x\in \Xset$, | + | If $u\in U$ then $u+t e_i \in U$ where $t>0$ and $e_i=(\indiacc{k=i})_{k \in [0:n+m]} \in \rset^{n+m+1}$. The previous inequality implies $\phi^T (u+t e_i)=\phi^T u + t\phi_i \geq 0$ for all $t>0$, therefore $\phi_i\geq 0$ for any $i \in [0:n]$. Now, since $\phi^T u\geq 0$ for all $u \in U$, a simple limiting argument yields for all $x\in \Xset$, |
| + | $$ | ||
| + | \phi_0 f(x)+\sum_{i=1}^n \phi_i h_i(x) +\sum_{j=n+1}^{n+m} \phi_j g_j(x)\geq 0 | ||
| + | $$ | ||
| + | To conclude, and since we already know that $\phi_i\geq 0$ for all $i \in [0:n]$, it only remains to prove that $\phi_0 \neq 0$ and to set in that case $\lambda^*_i=\frac{\phi_i}{\phi_0}$ for $i\in \lrcb{1,\ldots,n}$ and $\mu_j^*=\frac{\phi_{n+j}}{\phi_0}$ for $j \in \lrcb{1,\ldots,m}$. The rest of the argument is by contradiction. If $\phi_0=0$, then the previous inequality on a Slater point $x=\tilde x \in \mcD $ yields $\sum_{i=1}^n \phi_i h_i(\tilde x) \geq 0$ but since $h_i(\tilde x)<0$ for all $i \in [1:n]$, we finally get $\phi_i=0$ for all $i \in [1:n]$. Then the previous inequality becomes for any $x\in \Xset$, | ||
| $$ | $$ | ||
| - | \phi_0 f(x)+\sum_{i=1}^n \phi_i h_i(x) \geq 0 | + | \sum_{j=n+1}^{n+m} \phi_j g_j(x)= \lr{\sum_{j=n+1}^{n+m} \phi_j a_j}^T x + \sum_{j=n+1}^{n+m} \phi_j b_j\geq 0 |
| $$ | $$ | ||
| - | To conclude, and since we already know that $\phi_i\geq 0$ for all $i \in [0:n]$, it only remains to prove that $\phi_0 \neq 0$ and to set in that case $\lambda^*_i=\frac{\phi_i}{\phi_0}$. The rest of the argument is by contradiction. If $\phi_0=0$, then the previous inequality on a Slater point $x=\tilde x$ yields $\sum_{i=1}^n \phi_i h_i(\tilde x) \geq 0$ but since $h_i(\tilde x)<0$ for all $i \in [1:n]$, we finally get $\phi_i=0$ for all $i \in [1:n]$. Finally all the components of $\phi$ are null and we are face to a contradiction. The proof is completed. | + | which in turn implies that $\sum_{j=n+1}^{n+m} \phi_j a_j=0$ and since $(a_j)_{n+1\leq j \leq n+m}$ are linearly independent, we deduce that $\phi_j=0$ for any $j\in [n+1:n+m]$. Finally all the components of $\phi$ are null and we are face to a contradiction. The proof is completed. |
| - | $\eproof$ | + | |
| </hidden> | </hidden> | ||
| - | The Farkas lemma will imply the strong duality under the existence of a Slater point. | + | |
| - | <WRAP center round box 80%> | + | |
| - | __**The necessary condition**__ | + | <WRAP center round tip 80%> |
| - | Assume that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in\mcD$ and that there exists a Slater point. Then there exist $\lambda^* \geq 0$ such that $(x^*,\lambda^*)$ is a saddle point and by the Saddle point lemma, the <color /yellow>strong duality holds</color>. | + | === Theorem (KKT conditions) === |
| + | |||
| + | If $x^* \in \mcD$ minimizes $f$ and a Slater point exists, then there exist $\lambda^* \ge 0$ and $\mu^* \in \mathbb{R}^m$ such that: $(x^*,\lambda^*,\mu^*)$ is a saddle point and hence, KKT conditions hold: | ||
| + | |||
| + | $$ | ||
| + | \nabla_x f(x^*) + \sum_{i=1}^n \lambda_i^* \nabla h_i(x^*) + \sum_{j=1}^m \mu_j^* \nabla g_j(x^*) = 0 | ||
| + | $$ | ||
| + | |||
| + | $$ | ||
| + | h_i(x^*) \le 0,\ \lambda_i^* \ge 0,\ \lambda_i^* h_i(x^*) = 0 \quad \forall i | ||
| + | $$ | ||
| + | |||
| + | $$ | ||
| + | g_j(x^*) = 0 \quad \forall j | ||
| + | $$ | ||
| + | |||
| + | and the strong duality is satisfied. | ||
| </WRAP> | </WRAP> | ||
| - | <hidden Click here to see the proof> | + | |
| - | $\bproof$ | + | <hidden Proof> |
| - | Assume now that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in\mcD$. Then, $\{f-f(x^*)<0\} \cap \mcD =\emptyset$ so that there exists $\lambda^* \geq 0$ satisfying for all $x\in \Xset$, $f(x)-f(x^*)+\sum_{i=1}^n \lambda^*_i h_i(x) \geq 0$. And this implies that $(x^*,\lambda^*)$ is a saddle point since: for all $\lambda \geq 0$ and $x \in \Xset$, | + | Assume that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in \mcD$. Then $\{f-f(x^*)<0\} \cap \mcD = \emptyset$. By the Farkas lemma, there exist $\lambda^* \geq 0$ and $\mu^* \in \mathbb{R}^m$ such that for all $x\in \Xset$, |
| $$ | $$ | ||
| - | f(x^*)+\sum_{i=1}^n \lambda_i h_i(x^*) \leq f(x^*) \leq f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) | + | f(x)-f(x^*)+\sum_{i=1}^n \lambda^*_i h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x)\geq 0. |
| $$ | $$ | ||
| - | so that $(x^*,\lambda^*)$ is a saddle point and this implies that the strong duality holds (as we have seen in the previous section). This concludes the proof. | + | |
| - | $\eproof$ | + | This implies that $(x^*,\lambda^*,\mu^*)$ is a saddle point. Indeed, for all $\lambda \geq 0$, $\mu \in \rset^m$ and $x \in \Xset$, |
| + | $$ | ||
| + | f(x^*)+\sum_{i=1}^n \lambda_i h_i(x^*) + \sum_{j=1}^m \mu_j g_j(x^*)\leq f(x^*) \leq f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) + \sum_{j=1}^m \mu_j^* g_j(x) | ||
| + | $$ | ||
| + | |||
| + | Therefore, $(x^*,\lambda^*,\mu^*)$ is a saddle point, which implies strong duality (as shown previously). This concludes the proof. | ||
| </hidden> | </hidden> | ||
| + | ==== Application of the KKT conditions for a Wasserstein result ==== | ||
| + | Let $X_1,\ldots, X_n$ be $n$ points in $\mathbb{R}$ ordered in non-decreasing order. | ||
| + | Let $Y_1,\ldots, Y_n$ be $n$ other points in $\mathbb{R}$, also ordered in non-decreasing order. | ||
| + | |||
| + | <WRAP center round todo 80%> | ||
| + | === Proposition === | ||
| + | |||
| + | $$ | ||
| + | \mathcal{W}_p \left(\frac{1}{n}\sum_{i=1}^n \delta_{X_i}, \frac{1}{n}\sum_{i=1}^n \delta_{Y_i}\right) | ||
| + | = \left( \frac{1}{n} \sum_{i=1}^n |X_i - Y_i|^p \right)^{1/p}. | ||
| + | $$ | ||
| + | |||
| + | </WRAP> | ||
| + | |||
| + | <hidden Proof> | ||
| + | In order to prove the proposition, we will show that | ||
| + | $$ | ||
| + | \mathrm{argmin}\left\{ \sum_{i,j=1}^n p_{ij} |X_i- Y_j|^p \,: \forall i,j,\ p_{ij} \geq 0, \forall i, \sum_{j} p_{ij}=\frac{1}{N} , \forall j, \sum_{i} p_{ij}=\frac{1}{N} \right\}= \left[ \frac{1}{N} \mathsf{1}(i=j) \right]_{1\leq i,j \leq n}. | ||
| + | $$ | ||
| + | |||
| + | The function $p \mapsto \sum_{i,j} p_{ij}|X_i-Y_j|^p$ is linear, hence convex. Moreover, the constraints are affine: for all $i\in [1:n]$, $\sum_{j=1}^n p_{ij}=\frac{1}{N}$ and for all $j\in [1:n]$, $\sum_{i=1}^n p_{ij}=\frac{1}{N}$, together with inequality constraints $\forall i,j \in [1:n],\ -p_{ij} \leq 0$. | ||
| + | |||
| + | Therefore, we can apply the KKT theorem. We seek $p\in \rset^{n\times n}$, $\alpha,\beta \in \rset^n$, and $\gamma \in \rset^{n \times n}$ such that, defining | ||
| + | $$ | ||
| + | \mathcal{L}(p,\alpha,\beta,\gamma)=\sum_{i,j=1}^n p_{ij}|X_i-Y_j|^p - \gamma_{ij} p_{ij}+\sum_{i=1}^n \alpha_i \lr{\sum_{j=1}^n p_{ij}- \frac 1 N} +\sum_{j=1}^n \beta_j \lr{\sum_{i=1}^n p_{ij}- \frac 1 N}, | ||
| + | $$ | ||
| + | we have $\nabla_p \mathcal{L}(p,\alpha,\beta,\gamma)=0$, together with the conditions $\gamma_{ij} p_{ij}=0$ for all $i,j \in [1:n]$ and $\sum_{j=1}^np_{ij}=\sum_{i=1}^n p_{ij}=1/N$. | ||
| + | |||
| + | We already know that $p_{ij}=\frac 1 N \indi{i=j}$ is a solution. It therefore remains to find $\alpha,\beta,\gamma$ such that the KKT conditions are satisfied for this choice of $p$. These conditions can be written as: | ||
| + | * for all $i,j \in [1:n]$, $\partial_{p_{ij}} \mathcal{L}(p,\alpha,\beta,\gamma)=|X_i-Y_j|^p - \gamma_{ij} +\alpha_i +\beta_j=0$, | ||
| + | * for all $i,j \in [1:n]$, $\gamma_{ij} \geq 0$ and $\gamma_{ij} p_{ij}=0$. | ||
| + | |||
| + | The complementarity condition reduces to $\gamma_{ii}=0$, since $p_{ij}=\frac 1 N \indi{i=j}$. Hence, the KKT conditions are equivalent to: for all $i,j \in [1:n]$, | ||
| + | $\gamma_{ij}=\alpha_i+\beta_j + |X_i-Y_j|^p \geq 0$ and $\gamma_{ii}=\alpha_i+\beta_i + |X_i-Y_i|^p=0$. | ||
| + | |||
| + | This is in turn equivalent to the existence of $\beta \in \rset^n$ such that | ||
| + | * $\forall i,j \in [1:n], \ \beta_j - \beta_i \geq |X_i-Y_i|^p-|X_i-Y_j|^p$. | ||
| + | |||
| + | Set $\beta_1=0$ and, for all $i\in [1:n-1]$, define $\beta_j=\sum_{\ell=1}^{j-1} \lrcb{|X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p}$. Then, | ||
| + | $$ | ||
| + | \beta_j - \beta_i=\sum_{\ell=i}^{j-1} \lrcb{|X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p}. | ||
| + | $$ | ||
| + | |||
| + | Since $u \mapsto |u|^p$ is convex, we have $|a+c|^p-|a|^p \geq |b+c|^p-|b|^p$ for all $a \geq b$ and $c\geq 0$. We set $a=X_\ell -Y_{\ell+1}$, $b=X_i-Y_{\ell+1}$, and $c=Y_{\ell+1}-Y_\ell$. For $\ell \geq i$, we have $a \geq b$ and $c\geq 0$. Hence, for any $\ell \geq i$, the inequality can be rewritten as | ||
| + | $$ | ||
| + | |X_\ell-Y_\ell|^p - |X_\ell - Y_{\ell+1}|^p \geq |X_i-Y_\ell|^p - |X_i - Y_{\ell+1}|^p. | ||
| + | $$ | ||
| + | |||
| + | Finally, plugging into the previous identity yields | ||
| + | $$ | ||
| + | \beta_j-\beta_i \geq \sum_{\ell=i}^{j-1} |X_i-Y_\ell|^p - |X_i - Y_{\ell+1}|^p = |X_i-Y_i|^p - |X_i - Y_j|^p. | ||
| + | $$ | ||
| + | |||
| + | This proves the KKT conditions, and the proof is complete. | ||
| + | </hidden> | ||
| + | |||
| + | The proposition shows that, for any $p \geq 1$, | ||
| + | $$ | ||
| + | \mathcal{W}_p(\mu,\nu) | ||
| + | = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}, | ||
| + | $$ | ||
| + | |||
| + | where | ||
| + | $$ | ||
| + | \mu=\frac{1}{n} \sum_{i=1}^n \delta_{X_i}, | ||
| + | \qquad | ||
| + | \nu=\frac{1}{n} \sum_{i=1}^n \delta_{Y_i}. | ||
| + | $$ | ||
| + | |||
| + | We only assume that the sequences $(X_i)$ and $(Y_i)$ are ordered in non-decreasing order; in particular, repetitions are allowed. | ||
| + | |||
| + | We then extend this result to arbitrary discrete probability measures. Let | ||
| + | $$ | ||
| + | \mu=\sum_{i=1}^n \lambda_i \delta_{X_i}, | ||
| + | \qquad | ||
| + | \nu=\sum_{j=1}^m \gamma_j \delta_{Y_j}, | ||
| + | $$ | ||
| + | |||
| + | where the coefficients $(\lambda_i)$ and $(\gamma_j)$ are non-negative rational numbers summing to $1$. By the previous result, we still have | ||
| + | $$ | ||
| + | \mathcal{W}_p(\mu,\nu) | ||
| + | = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}. | ||
| + | $$ | ||
| + | |||
| + | By a density argument, this identity extends to coefficients $(\lambda_i)$ and $(\gamma_j)$ with non-negative real values summing to $1$. Finally, by approximation, we obtain that for any probability measures $\mu$ and $\nu$ on $(\rset,\mathcal{B}(\rset))$, | ||
| + | $$ | ||
| + | \mathcal{W}_p(\mu,\nu) | ||
| + | = \left( \int_0^1 \left|F_\mu^{-1}(u)- F_\nu^{-1}(u)\right|^p \, \rmd u \right)^{1/p}. | ||
| + | $$ | ||