Multi-armed Bandit

Multi-armed bandit problem (MAB) is usually said to be exploitation vs exploration. There are a number of strategies to deal with MAB. Personaly, I like to treat MAB as portfolio theory with unknown expected value and variance, and the weights in portfolio correspond to probability of choosing that option.

Usually, we just want to maximize the return of MAB, but portfolio approach requires us to supply a minimum bound on return and minimize the variance. And the min-bound could be changed dynamically, IMO, this is nice.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s