## Reflection of past 5 years

“…I was in love with the result… but I wasn’t in love with the process.  And because of that, I failed at it. Repeatedly. Hell, I didn’t even try hard enough to fail at it.  I hardly tried at all…”

## Problem Statement

Given a sequence of integer e.g. [a,b,c,d], find the last decimal digits of the exponential of the sequence e.g. a^(b^(c^d))%10

## Solution (in Javascript)

```  function lastDigit(arr){
let a = i => i >= arr.length ? 1 : arr[i];
let isZero = i => a(i) === 0 && isNotZero(i+1);
let isNotZero = i => a(i) !== 0 || a(i) === 0 && isZero(i+1);
let isOne = i => a(i) === 1 || isZero(i+1);

if( isZero(1) ) return 1;

// n = a1^(a2^a3)%4
var n;
if( isZero(2) ) n = 1;
else if( isOne(2) ) n = (a(1)+3)%4+1;
else if( a(1) % 2 === 0 ) n = 4;
else if( a(2) % 2 === 0 ) n = 1;
else n = (a(1)+3)%4+1;

return Math.pow(a(0)%10, n)%10;
}
```

## Data Storage

After days of research, I just find out that ZFS is the ONLY solution that meet my requirements.
1. survive from silent data corruption and disk failure.
2. flexible in adding/removal of hard disks.
3. support encryption

It is suprised to me that many solutions seem do not care about silent data corruption (SDC) or bad sector. Full disk encryption with LUKS and LVM is not tolerent to SDC. Hardware RAID is not flexible, since I cannot
add various size of hard drives. Software RAID-1 has detection, but not correction. In theory, RAID-1 with three drives could have correction, but it seems no software implementations support. RAID-6 support both detection and correction, but software controlled RAID-6 has performance concern.

## Multi-armed Bandit

Multi-armed bandit problem (MAB) is usually said to be exploitation vs exploration. There are a number of strategies to deal with MAB. Personaly, I like to treat MAB as portfolio theory with unknown expected value and variance, and the weights in portfolio correspond to probability of choosing that option.

Usually, we just want to maximize the return of MAB, but portfolio approach requires us to supply a minimum bound on return and minimize the variance. And the min-bound could be changed dynamically, IMO, this is nice.

## Portfolio Theory (Part 2)

In part 1, we have considered those with equality constraints. In this part, we also consider the situation that all weights are greater than zero and expected return is a min-bounded instead of exact value.

## Problem statement

Given N random variables with known expected value u[i],i=1..N and covariance matrix v[i,j] for i,j=1..N and a target return R. Find weight w[i] such that
1. sum[i=1..N]( w[i] ) = W (usually set W = 1)
2. sum[i=1..N]( w[i]*u[i] ) >= R
3. w[i] >= 0 for i=1..N
4. minimize sum[i=1..N,j=1..N]( w[i]*v[i,j]*w[j] ) i.e. variance of portfolio

## Algorithm

First, we converse all inequalities into form of x >= 0. Specifically, we introduce a slack variable s and change this constraint
sum[i=1..N]( w[i]*u[i] ) >= R
into
sum[i=1..N]( w[i]*u[i] ) + s = R && s >= 0

The algorithm run as following.
1. solve the optimization problem with only equality constraints by Lagrange Mulipliers, name the solution as opt.
2. if opt is feasible, then we are done. If not, there must be some inequalities violated (i.e. some varibles are negative). For each negative variabls, we set it to zero and repeat step 1.

## Code

```import numpy as np
from numpy.linalg import lstsq

EPS = 1e-10
def portfolio2(u,v,R,W=1):
if R > np.amax(u) or R < np.amin(u):
raise ArithmeticError('no solution')
n = len(u)
ans = -np.ones(n)
ix = []           # index of variables that are zero
while np.any(ans < -EPS):
# standard solution
m = np.zeros((n+2,n+2))
m[0:n,0:n] = 2*v
m[n,0:n] = m[0:n,n] = 1
m[n+1,0:n] = m[0:n,n+1] = u
y = np.zeros(n+2)
y[n] = W
y[n+1] = R

# set variables to zero
m[:, ix] = 0
# remove corresponding Lagrange multiplers
m = np.delete(m, ix, 0)
y = np.delete(y, ix)

# solve by least square
ans = lstsq(m, y)[0][0:n];

# update variables that are zero
ix = np.union1d(ix, np.arange(n)[ans < -EPS]).astype(int)
return ans
```

## Overview

KKT multiplier is an extension to Lagrange multiplier, which deal with both equality and inequality. However, unlike Lagrange multiplier, KKT multiplier do not convert constrained optimization problem into system of equations, but it provides a necessary conditions for solution that we can search for.

For convex objective, there is only one extrema, so KKT conditions are also sufficient conditions.

## Statment

let x’ = argmin[x] f(x) s.t. h[i](x) = 0 && g[j](x) >= 0 for i=1..M, j=1..N
let L(x,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*g[j](x) )

The following statements hold
1. D[L(x,a,b),x](x’) = 0
2. h[i](x’) = 0 for i=1..M (given)
3. g[j](x’) >= 0 for j=1..N (given)
4. b[j] <= 0 for j=1..N
5. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

## Proof

### Lemma 1

consider a univariate optimization x’ = argmin[x] f(x) s.t. x >= 0
x’ must be one of the following cases
1. it is an interior-point i.e. x’ > 0
2. it is on edge i.e. x’ = 0
for case 1, we have D[f(x),x](x’) = 0
for case 2, we have D[f(x),x](x’) >= 0 because f(x’) must smaller than its neighbours. As x move in direction to positive infinity, f(x) must be increasing.

so, in all cases the followings hold
1. x’ >= 0
2. D[f(x),x](x’) >= 0
3. x’ * D[f(x),x](x’) = 0

for max f(x) s.t. x >= 0, condition 2 change to be D[f(x),x](x’) <= 0

### Proof

First, introduce slack variable for inequalities. i.e.
convert g(x) >= 0 to be g(x) = s && s >= 0

then, the original problem statement become
min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0 && s[j] >= 0 for i=1..M, j=1..N
(NOTE: this does not change the problem and solution)

Ignore s[j] >= 0 for a moment, this is equality constraints optimization,
from the proof of Lagrange Multiplier, we have
min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0
= min L(x,s,a,b)
where
L(x,s,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*(g[j](x)-s[j]) )

Differentiate L(x,s,a,b) with x,a,b, we will get condition 1,2,3.

Consider min L(x,s,a,b) over s[j], from lemma 1, we have
1. s[j] >= 0
2. D[ L(x,s,a,b), s[j] ](s[j]) >= 0
3. s[j] * D[L(x,s,a,b),s[j]](s[j]) = 0
Since D[ L(x,s,a,b), s[j] ] = -b[j] put back into 2 & 3, we have
2. b[j] <= 0 for j=1..N
3. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

These are condition 4 and 5.