Problem statement:
Assume that there are 3 independent arms, each arm i when being played will give a reward of either 0 or 1, according to a Bernoulli(p_i). Real-world situation: 3 online ads banners. Each ads i, once displayed, will be clicked with some probability p_i
Objective: which ads to display so that we have maximum click-through rate.
Now initially let the prior of each p_i be Beta(1,1), which is uniform. If right now each p_i is Beta(a,b) (conjugate prior for binomial), then in the if arm i is played in the next round, p_i is still Beta(a’,b’), where a’ = a + # of successes, and b’ = b + # of failures
We can do the same thing in case of Gaussian distributions too.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import beta
num_trials = 1000
#assume there are 3 arms
ARM_TRUEVALS = [0.2, 0.5, 0.75]#true val of each arm.
#which is the P[winning] if we play that arm.
#Q: how many arms should we simulate?
#First, define each arm as an object, so that we can update
#each arm systematically
class Arm(object):
def __init__(self, p):
'''
initialize the arm
p is prob[winning] for that arm, i.e., its true value
'''
self.p = p #initialize the arm true values
self.a = 1 #initial val of Beta distribution
self.b = 1 #initial val of Beta distribution
#for both a and b are 1, Beta distr. is just a uniform distr.
#so initially we have no knowledge about each arm.
def play(self):
'''
play the arm, return 1 with probability p
return 0 otherwise
'''
return np.random.random() < self.p
def reward(self):
'''
return a random reward from playing this arm
which is a random sample from a Beta distr.
'''
return np.random.beta(self.a, self.b)
def update(self, x):
'''
to udpate the arm's parameters a and b (Beta)
these update formulae we derived from the conjugate priors
where x is either 0 or 1.
'''
self.a += x
self.b += 1 - x
# Define a plotting function to plot the arm's pdf
def plot(arms, trial):
'''
Input:
- arms is a list of Arms objects that we want to plot pdfs
- trial = number of trials up to the plotting time
Output:
- To plot each arm pdf on the same chart to compare
'''
x = np.linspace(0, 1, 200)#200 points on the x-axis
for b in arms:
#plot of the pdf of beta
y = beta.pdf(x, b.a, b.b)
plt.plot(x, y, label="real p: %.4f" % b.p)
plt.title("Arm distributions after %s trials" % trial)
plt.legend()
plt.show()
#Now we define the main function experiment
def experiment():
'''
main function to run this experiment
'''
#First, initialize a bunch of arms
arms = [Arm(p) for p in ARM_TRUEVALS]
#Below is a list of number of trials during the experiment when
#we want to take a look at how close we are to convergence
sample_points = [5,10,100,500,num_trials-1]
#Now, during experiment we will play the best current arm
#according to TS sampling algorithm.
for i in range(num_trials):
#Initialize the best arm up to now
bestArm = None
#initialize best reward
maxReward = -1
allRewards = [] # let's collect these just to print for debugging
#Now, we pick an arm according to the best current reward
for j in arms:
reward = j.reward()#sample reward from the playing arm j
allRewards.append("%.4f" % reward)
if reward > maxReward:
maxReward = reward #keep track of max reward
bestArm = j
if i in sample_points:#then plot
print("current sample rewards: %s" % allRewards)
plot(arms, i) #i is the trial number
# now, we have found the best arm for this triall i, then
# we just play it
x = bestArm.play()#gives a 0 or 1, which indicates a success/failure
#this is used to update the best arm as follows:
# update the distribution for the best arm we just played
bestArm.update(x)
if __name__ == "__main__":
experiment()
Reference: Adapted from the following:
1) Class OPNS525: Statistical Learning in Sequential Decision Making at Northwestern by Prof. Daniel Russo.
2) Online class: Bayesian Machine Learning in Python: A/B Testing.