## Overview

KKT multiplier is an extension to Lagrange multiplier, which deal with both equality and inequality. However, unlike Lagrange multiplier, KKT multiplier do not convert constrained optimization problem into system of equations, but it provides a necessary conditions for solution that we can search for.

For convex objective, there is only one extrema, so KKT conditions are also sufficient conditions.

## Statment

let x’ = argmin[x] f(x) s.t. h[i](x) = 0 && g[j](x) >= 0 for i=1..M, j=1..N

let L(x,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*g[j](x) )

The following statements hold

1. D[L(x,a,b),x](x’) = 0

2. h[i](x’) = 0 for i=1..M (given)

3. g[j](x’) >= 0 for j=1..N (given)

4. b[j] <= 0 for j=1..N

5. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

## Proof

### Lemma 1

consider a univariate optimization x’ = argmin[x] f(x) s.t. x >= 0

x’ must be one of the following cases

1. it is an interior-point i.e. x’ > 0

2. it is on edge i.e. x’ = 0

for case 1, we have D[f(x),x](x’) = 0

for case 2, we have D[f(x),x](x’) >= 0 because f(x’) must smaller than its neighbours. As x move in direction to positive infinity, f(x) must be increasing.

so, in all cases the followings hold

1. x’ >= 0

2. D[f(x),x](x’) >= 0

3. x’ * D[f(x),x](x’) = 0

for max f(x) s.t. x >= 0, condition 2 change to be D[f(x),x](x’) <= 0

### Proof

First, introduce slack variable for inequalities. i.e.

convert g(x) >= 0 to be g(x) = s && s >= 0

then, the original problem statement become

min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0 && s[j] >= 0 for i=1..M, j=1..N

(NOTE: this does not change the problem and solution)

Ignore s[j] >= 0 for a moment, this is equality constraints optimization,

from the proof of Lagrange Multiplier, we have

min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0

= min L(x,s,a,b)

where

L(x,s,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*(g[j](x)-s[j]) )

Differentiate L(x,s,a,b) with x,a,b, we will get condition 1,2,3.

Consider min L(x,s,a,b) over s[j], from lemma 1, we have

1. s[j] >= 0

2. D[ L(x,s,a,b), s[j] ](s[j]) >= 0

3. s[j] * D[L(x,s,a,b),s[j]](s[j]) = 0

Since D[ L(x,s,a,b), s[j] ] = -b[j] put back into 2 & 3, we have

2. b[j] <= 0 for j=1..N

3. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

These are condition 4 and 5.