$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bbQ}{{\mathbb Q}} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\condtrans}[3]{p_{#1}(#2|#3)} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\cro}[1]{\langle #1 \rangle} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Hset}{\mathsf{H}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\jointtrans}[3]{p_{#1}(#2,#3)} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcf}{\mathcal{F}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\N}{\mathcal{N}} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\pminfty}{_{-\infty}^\infty} \newcommand{\PP}{\mathbb P} \newcommand{\projorth}[1]{\mathsf{P}^\perp_{#1}} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\revcondtrans}[3]{q_{#1}(#2|#3)} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\weaklim}[1]{\stackrel{\mathcal{L}_{#1}}{\rightsquigarrow}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$

2017/10/07 23:39 · douc

karush kuhn tucker, kkt, farkas lemma

$$ \newcommand{\mcD}{\mathcal D} \newcommand{\mcl}{\mathcal L} $$

Weak duality

Let $f$ and $(h_i)_{1 \leq i \leq n}$ be convex differentiable functions on $\Xset=\rset^p$. Let $\mcD=\cap_{i=1}^n \{h_i \leq 0\}\neq \emptyset$ and note that by convexity of the functions $h_i$, the set $\mcD$ is actually a convex set. We are interested in the infimum of the convex function $f$ on the convex set $\mcD$. Define the Lagrange function $$ \mcl(x,\lambda)=f(x)+\sum_{i=1}^n \lambda_i h_i(x) $$ where $\lambda=(\lambda_1,\ldots,\lambda_n)^T \in \rset^n$. We first show the weak duality relation. Start with this simple remark: for all $(x,\lambda) \in \Xset \times \rset^n$, $$ \inf_{x \in \Xset} \mcl(x,\lambda) \leq \mcl(x,\lambda) $$ Taking the supremum wrt $\lambda \geq 0$ on both sides yields: $$ \sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda) \leq \sup_{\lambda\geq 0} \mcl(x,\lambda)=\infty \indin{x\notin \mcD}+f(x)\indin{x\in \mcD} $$ And taking now the infimum wrt $x \in \Xset$ on both sides, we finally get the weak duality relation: \begin{equation} \label{eq:weak} \sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda) \leq \inf_{x \in \Xset} \sup_{\lambda\geq 0}\mcl(x,\lambda)=\inf_{x \in \mcD} f(x) \end{equation} Note that in the lhs (left-hand side), the infimum is wrt $x\in \Xset$ and therefore, there is no constraint, which is nice… The rhs (right-hand side) is called the primal problem and the lhs the dual problem. Note, since $x\mapsto \mcl(x,\lambda)$ is convex, that the dual problem $\sup_{\lambda\geq 0}\inf_{x \in \Xset} \mcl(x,\lambda)$ is equivalent to $$ \sup \set{\mcl(x,\lambda)}{\lambda \geq 0\mbox{ and }\nabla_x \mcl(x,\lambda)=0} $$

To obtain the equality in \eqref{eq:weak} (which is named the strong duality relation), we actually need some additional assumptions (for example the existence of a Slater point). But before that, let us check a very useful relation…

Karush-Kuhn-Tucker conditions

Sufficient conditions

Lemma Assume that there exist $(x^*,\lambda^*) \in \mcD\times (\rset^+)^n$ such that \begin{equation} \nabla_x \mcl(x^*,\lambda^*)=0 \end{equation} and for all $i \in [1:n]$, we have either $\lambda^*_i=0$ or $h_i(x^*)=0$ . Then, the strong duality holds and $$ f(x^*)=\inf_{x \in \mcD} f(x)=\mcl(x^*,\lambda^*) $$

Click to see the proof

$\bproof$ For every $x \in \mcD$, the convexity of the function $f$ yields $$ f(x)-f(x^*) \geq \nabla f(x^*)^T (x-x^*). $$ It remains to prove $\nabla f(x^*)^T (x-x^*)\geq 0$. Using first $\nabla_x \mcl(x^*,\lambda^*)=0$ and then the convexity of the functions $h_i$, we get $$ \nabla f(x^*)^T (x-x^*)=-\sum_{i=1}^n \lambda^*_i \nabla_x h_i^T(x^*) (x-x^*) \geq -\sum_{i=1}^n \lambda^*_i (h_i(x)-h_i(x^*))=- \sum_{i=1}^n \lambda^*_i \underbrace{h_i(x)}_{\leq 0} \geq 0 $$ This finishes the proof. $\eproof$

Saddle points

Definition We say that $(x^*,\lambda^*) \in \Xset \times (\rset^+)^n$ is a saddle point of the Lagrange function $\mcl$ if for every $(x,\lambda) \in \Xset \times (\rset^+)^n$, \begin{equation}\label{eq:saddle} \mcl(x^*,\lambda) \leq \mcl(x,\lambda^*) \end{equation}

Saddle point Lemma If $(x^*,\lambda^*) \in \Xset \times (\rset^+)^n$ is a saddle point for $\mcl$ then the strong duality holds, and the KKT conditions holds for $(x^*,\lambda^*)$.

click here to see the proof

$\bproof$ Using \eqref{eq:saddle} with $\lambda=\lambda^*$ first and then with $x=x^*$, we have for every $(x,\lambda) \in \Xset \times (\rset^+)^n$, \begin{equation*} \mcl(x^*,\lambda) \leq \mcl(x^*,\lambda^*) \leq \mcl(x,\lambda^*) \end{equation*} The existence of a saddle point implies the strong duality since $$ \inf_{x \in\Xset} \sup_{\lambda \geq 0} \mcl(x,\lambda) \leq \sup_{\lambda \geq 0} \mcl(x^*,\lambda) \leq \mcl(x^*,\lambda^*) \leq \inf_{x \in\Xset} \mcl(x,\lambda^*) \leq \sup_{\lambda \geq 0} \inf_{x \in\Xset} \mcl(x,\lambda) $$ which is the converse inequality of \eqref{eq:weak}. This finally implies: \begin{equation}\label{eq:all} \sup_{\lambda \geq 0} \inf_{x \in\Xset} \mcl(x,\lambda)=\inf_{x \in\Xset} \mcl(x,\lambda^*)=\mcl(x^*,\lambda^*)=\inf_{x \in\Xset} \sup_{\lambda \geq 0} \mcl(x,\lambda)=\inf_{x \in\mcD} f(x). \end{equation}

Note that $\mcl(x^*,\lambda^*)=\inf_{x \in\Xset} \mcl(x,\lambda^*)$ shows that $\nabla_x \mcl(x^*,\lambda^*)=0$.

The upper bound in \eqref{eq:saddle} regardless the value of $\lambda\geq 0$ shows that $h_i(x^*)\leq 0$ for every $i \in [1:n]$, that is $x^* \in \mcD$. Now taking \eqref{eq:saddle} with $\lambda=0$ yields for $x\in \mcD$, $$ f(x^*)=\mcl(x^*,0) \leq \mcl(x,\lambda^*) \leq f(x) $$ which shows that $f(x^*)=\inf_{x \in\mcD} f(x)$. Combining with \eqref{eq:all} yields $\mcl(x^*,\lambda^*)=f(x^*)$ so that $\sum_{i=1}^n \lambda^*_i h_i(x^*_i)=0$ and since $\lambda^* \geq 0$ and $x^* \in \mcD$, this implies $\lambda^*_i h_i(x^*)=0$ for all $i \in [1:n]$. $\eproof$

Necessary conditions and strong duality

We now assume the existence of Slater points: there exists $\tilde x \in \mcD$ such that for all $i \in \{1,\ldots,n\}$, $h_i(\tilde x)<0$.

The convex Farkas lemma Assume that there exists a Slater point. Then, $\{f<0\} \cap \mcD =\emptyset$ iff there exists $\lambda^* \geq 0$ such that for all $x\in \mcD$, $f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) \geq 0$.

Click here to see the proof

$\bproof$ Set $$ U=\{u=u_{0:n} \in \rset^{n+1}; \exists\ x \in \mcD, f(x)<u_0 \ \mbox{and} \ h_i(x)\leq u_i \mbox{ for all } i \in [1:n]\}. $$ The condition $\{f<0\} \cap \mcD =\emptyset$ is equivalent to saying that $0 \notin U$ and since $U$ is clearly a convex set, by a separation argument, there exists a non-null vector $\phi \in \rset^{n+1}$ such that \begin{equation} \label{eq:ineg} \forall u \in U\,, \quad \phi^T u\geq 0 \end{equation} Take an arbitrary $i \in [0:n]$. If $u\in U$ then $u+t e_i \in U$ where $t>0$ and $e_i=(\indiacc{k=i})_{k \in [0:n]} \in \rset^{n+1}$. The previous inequality implies $\phi^T (u+t e_i)=\phi^T u + t\phi_i \geq 0$ for all $t>0$, therefore $\phi_i\geq 0$. Now, since $\phi^T u\geq 0$ for all $u \in U$, a simple limiting argument yields for all $x\in \mcD$, $$ \phi_0 f(x)+\sum_{i=1}^n \phi_i h_i(x) \geq 0 $$ To conclude, and since we already know that $\phi_i\geq 0$ for all $i \in [0:n]$, it only remains to prove that $\phi_0 \neq 0$ and to set in that case $\lambda^*_i=\frac{\phi_i}{\phi_0}$. The rest of the argument is by contradiction. If $\phi_0=0$, then the previous inequality on a Slater point $x=\tilde x$ yields $\sum_{i=1}^n \phi_i h_i(\tilde x) \geq 0$ but since $h_i(\tilde x)<0$ for all $i \in [1:n]$, we finally get $\phi_i=0$ for all $i \in [1:n]$. Finally all the components of $\phi$ are null and we are face to a contradiction. The proof is completed. $\eproof$

The Farkas lemma will imply the strong duality under the existence of a Slater point.

The necessary condition Assume that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in\mcD$ and that there exists a Slater point. Then there exist $\lambda^* \geq 0$ such that $(x^*,\lambda^*)$ is a saddle point and by the Saddle point lemma, the strong duality holds.

Click here to see the proof

$\bproof$ Assume now that $f(x^*)=\inf_{x\in \mcD} f(x)$ for some $x^* \in\mcD$. Then, $\{f-f(x^*)<0\} \cap \mcD =\emptyset$ so that there exists $\lambda^* \geq 0$ satisfying for all $x\in \mcD$, $f(x)-f(x^*)+\sum_{i=1}^n \lambda^*_i h_i(x) \geq 0$. And this implies that $(x^*,\lambda^*)$ is a saddle point since: for all $\lambda \geq 0$ and $x \in \mcD$, $$ f(x^*)+\sum_{i=1}^n \lambda_i h_i(x^*) \leq f(x^*) \leq f(x)+\sum_{i=1}^n \lambda^*_i h_i(x) $$ so that $(x^*,\lambda^*)$ is a saddle point and this implies that the strong duality holds (as we have seen in the previous section). This concludes the proof. $\eproof$

Welcome to Randal Douc's wiki

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

Weak duality

Karush-Kuhn-Tucker conditions

Sufficient conditions

Saddle points

Necessary conditions and strong duality

Welcome to Randal Douc's wiki

User Tools

Site Tools

Sidebar

Wiki

Wiki

Courses and public working groups

Courses and public working groups

Private Working Groups

Private Working Groups

Personal Notes

Personal Notes

Réponses

Réponses

Miscellanous

Miscellanous

Table of Contents

Weak duality

Karush-Kuhn-Tucker conditions

Sufficient conditions

Saddle points

Necessary conditions and strong duality

Page Tools