KKT multiplier

Overview

KKT multiplier is an extension to Lagrange multiplier, which deal with both equality and inequality. However, unlike Lagrange multiplier, KKT multiplier do not convert constrained optimization problem into system of equations, but it provides a necessary conditions for solution that we can search for.

For convex objective, there is only one extrema, so KKT conditions are also sufficient conditions.

Statment

let x’ = argmin[x] f(x) s.t. h[i](x) = 0 && g[j](x) >= 0 for i=1..M, j=1..N
let L(x,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*g[j](x) )

The following statements hold
1. D[L(x,a,b),x](x’) = 0
2. h[i](x’) = 0 for i=1..M (given)
3. g[j](x’) >= 0 for j=1..N (given)
4. b[j] <= 0 for j=1..N
5. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

Proof

Lemma 1

consider a univariate optimization x’ = argmin[x] f(x) s.t. x >= 0
x’ must be one of the following cases
1. it is an interior-point i.e. x’ > 0
2. it is on edge i.e. x’ = 0
for case 1, we have D[f(x),x](x’) = 0
for case 2, we have D[f(x),x](x’) >= 0 because f(x’) must smaller than its neighbours. As x move in direction to positive infinity, f(x) must be increasing.

so, in all cases the followings hold
1. x’ >= 0
2. D[f(x),x](x’) >= 0
3. x’ * D[f(x),x](x’) = 0

for max f(x) s.t. x >= 0, condition 2 change to be D[f(x),x](x’) <= 0

Proof

First, introduce slack variable for inequalities. i.e.
convert g(x) >= 0 to be g(x) = s && s >= 0

then, the original problem statement become
min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0 && s[j] >= 0 for i=1..M, j=1..N
(NOTE: this does not change the problem and solution)

Ignore s[j] >= 0 for a moment, this is equality constraints optimization,
from the proof of Lagrange Multiplier, we have
min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0
= min L(x,s,a,b)
where
L(x,s,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*(g[j](x)-s[j]) )

Differentiate L(x,s,a,b) with x,a,b, we will get condition 1,2,3.

Consider min L(x,s,a,b) over s[j], from lemma 1, we have
1. s[j] >= 0
2. D[ L(x,s,a,b), s[j] ](s[j]) >= 0
3. s[j] * D[L(x,s,a,b),s[j]](s[j]) = 0
Since D[ L(x,s,a,b), s[j] ] = -b[j] put back into 2 & 3, we have
2. b[j] <= 0 for j=1..N
3. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

These are condition 4 and 5.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s