## A Hard Problem to Computer (and me)

Click this for an answer

Click this for an answer

Given a sequence of integer e.g. [a,b,c,d], find the last decimal digits of the exponential of the sequence e.g. a^(b^(c^d))%10

function lastDigit(arr){ let a = i => i >= arr.length ? 1 : arr[i]; let isZero = i => a(i) === 0 && isNotZero(i+1); let isNotZero = i => a(i) !== 0 || a(i) === 0 && isZero(i+1); let isOne = i => a(i) === 1 || isZero(i+1); if( isZero(1) ) return 1; // n = a1^(a2^a3)%4 var n; if( isZero(2) ) n = 1; else if( isOne(2) ) n = (a(1)+3)%4+1; else if( a(1) % 2 === 0 ) n = 4; else if( a(2) % 2 === 0 ) n = 1; else n = (a(1)+3)%4+1; return Math.pow(a(0)%10, n)%10; }

After days of research, I just find out that ZFS is the ONLY solution that meet my requirements.

1. survive from silent data corruption and disk failure.

2. flexible in adding/removal of hard disks.

3. support encryption

It is suprised to me that many solutions seem do not care about silent data corruption (SDC) or bad sector. Full disk encryption with LUKS and LVM is not tolerent to SDC. Hardware RAID is not flexible, since I cannot

add various size of hard drives. Software RAID-1 has detection, but not correction. In theory, RAID-1 with three drives could have correction, but it seems no software implementations support. RAID-6 support both detection and correction, but software controlled RAID-6 has performance concern.

Multi-armed bandit problem (MAB) is usually said to be **exploitation vs exploration**. There are a number of strategies to deal with MAB. Personaly, I like to treat MAB as portfolio theory with unknown expected value and variance, and the weights in portfolio correspond to probability of choosing that option.

Usually, we just want to maximize the return of MAB, but portfolio approach requires us to supply a minimum bound on return and minimize the variance. And the min-bound could be changed dynamically, IMO, this is nice.

In part 1, we have considered those with equality constraints. In this part, we also consider the situation that all weights are greater than zero and expected return is a min-bounded instead of exact value.

Given N random variables with known expected value u[i],i=1..N and covariance matrix v[i,j] for i,j=1..N and a target return R. Find weight w[i] such that

1. sum[i=1..N]( w[i] ) = W (usually set W = 1)

2. sum[i=1..N]( w[i]*u[i] ) >= R

3. w[i] >= 0 for i=1..N

4. minimize sum[i=1..N,j=1..N]( w[i]*v[i,j]*w[j] ) i.e. variance of portfolio

First, we converse all inequalities into form of x >= 0. Specifically, we introduce a slack variable s and change this constraint

sum[i=1..N]( w[i]*u[i] ) >= R

into

sum[i=1..N]( w[i]*u[i] ) + s = R && s >= 0

The algorithm run as following.

1. solve the optimization problem with only equality constraints by Lagrange Mulipliers, name the solution as opt.

2. if opt is feasible, then we are done. If not, there must be some inequalities violated (i.e. some varibles are negative). For each negative variabls, we set it to zero and repeat step 1.

import numpy as np from numpy.linalg import lstsq EPS = 1e-10 def portfolio2(u,v,R,W=1): if R > np.amax(u) or R < np.amin(u): raise ArithmeticError('no solution') n = len(u) ans = -np.ones(n) ix = [] # index of variables that are zero while np.any(ans < -EPS): # standard solution m = np.zeros((n+2,n+2)) m[0:n,0:n] = 2*v m[n,0:n] = m[0:n,n] = 1 m[n+1,0:n] = m[0:n,n+1] = u y = np.zeros(n+2) y[n] = W y[n+1] = R # set variables to zero m[:, ix] = 0 # remove corresponding Lagrange multiplers m = np.delete(m, ix, 0) y = np.delete(y, ix) # solve by least square ans = lstsq(m, y)[0][0:n]; # update variables that are zero ix = np.union1d(ix, np.arange(n)[ans < -EPS]).astype(int) return ans

KKT multiplier is an extension to Lagrange multiplier, which deal with both equality and inequality. However, unlike Lagrange multiplier, KKT multiplier do not convert constrained optimization problem into system of equations, but it provides a necessary conditions for solution that we can search for.

For convex objective, there is only one extrema, so KKT conditions are also sufficient conditions.

let x’ = argmin[x] f(x) s.t. h[i](x) = 0 && g[j](x) >= 0 for i=1..M, j=1..N

let L(x,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*g[j](x) )

The following statements hold

1. D[L(x,a,b),x](x’) = 0

2. h[i](x’) = 0 for i=1..M (given)

3. g[j](x’) >= 0 for j=1..N (given)

4. b[j] <= 0 for j=1..N

5. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

consider a univariate optimization x’ = argmin[x] f(x) s.t. x >= 0

x’ must be one of the following cases

1. it is an interior-point i.e. x’ > 0

2. it is on edge i.e. x’ = 0

for case 1, we have D[f(x),x](x’) = 0

for case 2, we have D[f(x),x](x’) >= 0 because f(x’) must smaller than its neighbours. As x move in direction to positive infinity, f(x) must be increasing.

so, in all cases the followings hold

1. x’ >= 0

2. D[f(x),x](x’) >= 0

3. x’ * D[f(x),x](x’) = 0

for max f(x) s.t. x >= 0, condition 2 change to be D[f(x),x](x’) <= 0

First, introduce slack variable for inequalities. i.e.

convert g(x) >= 0 to be g(x) = s && s >= 0

then, the original problem statement become

min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0 && s[j] >= 0 for i=1..M, j=1..N

(NOTE: this does not change the problem and solution)

Ignore s[j] >= 0 for a moment, this is equality constraints optimization,

from the proof of Lagrange Multiplier, we have

min[x] f(x) s.t. h[i](x) = 0 && g[j](x) – s[j] = 0

= min L(x,s,a,b)

where

L(x,s,a,b) = f(x) + sum[i=1..M]( a[i]*h[i](x) ) + sum[j=1..N]( b[j]*(g[j](x)-s[j]) )

Differentiate L(x,s,a,b) with x,a,b, we will get condition 1,2,3.

Consider min L(x,s,a,b) over s[j], from lemma 1, we have

1. s[j] >= 0

2. D[ L(x,s,a,b), s[j] ](s[j]) >= 0

3. s[j] * D[L(x,s,a,b),s[j]](s[j]) = 0

Since D[ L(x,s,a,b), s[j] ] = -b[j] put back into 2 & 3, we have

2. b[j] <= 0 for j=1..N

3. b[j]*g[j](x') = 0 for j=1..N (complementary slackness)

These are condition 4 and 5.

Kalman filter can be applied to aggregate multiple sources of data with Gaussian errors, while allowing addition of two noisy data. This is due to two closed properties of normal distribution.

Let a, b be two independent random variables. s.t.

a ~ Normal( u1, sd1 )

b ~ Normal( u2, sd2 )

then

1. a+b ~ Normal( u1 + u2, sqrt( sd1^2 + sd2^2) )

2. P( a == x && b == x) = P( a == x ) * P( b == x ) = normal(x, u3, sd3)

where

normal(x,u,sd) = exp( (x-u)^2/sd^2) / sqrt(2*pi) / sd

u3 = (u1*sd2^2 + u2*sd1^2)/(sd1^2+sd2^2)

sd3 = 1/ sqrt( 1/sd1^2 + 1/sd2^2 )