Bandit Algorithms

36.2 Frequentist analysis 432

1:Input Cumulative distribution functionsF 1 (1),...,Fk(1) 2:fort= 1,...,ndo 3: Sampleθi(t)∼Fi(t) for eachi 4: ChooseAt= argmaxi∈[k]θi(t) 5: ObserveXtand update:

Fi(t+ 1) =Fi(t) fori 6 =At and FAt(t+ 1) =Update(FAt(t),At,Xt) 6:end for Algorithm 23:Follow the perturbed leader.

Thompson sampling is recovered by choosingF 1 (1),...,Fk(1) to the cumulative distribution functions of the mean reward of each arm for a prior that is independent over the arms. Then lettingUpdatebe the function that updates the posterior for the played arm. There are, however, many alternatives ways to configure this algorithm.

The core property that we use in the analysis of Algorithm 23 (to be presented soon) is thatFi(t+ 1) =Fi(t) wheneverAt 6 =i. WhenUpdate(·) is a Bayesian update this corresponds to choosing an independent prior on the distribution of each arm.

LetFisbe the cumulative distribution function used for armiin roundstwhen Ti(t−1) =s. This quantity is defined even ifTi(n)< sby using the stacked model from Section 4.6.

Theorem36.2. Letibe an action andε∈Rbe arbitrary. Then the expected
number of times Algorithm 23 plays actioniis bounded by

E[Ti(n)]≤1 +E

[n− 1 ∑

s=0

(

1

G 1 s

− 1

)]

+E

[n− 1 ∑

s=0

I{Gis> 1 /n}

]

, (36.3)

whereGis= 1−Fis(μ 1 −ε).

In applicationsεis normally chosen to be a small positive constant. In this case the first sum in Eq. (36.3) measures the probability that the sample corresponding to the first arm is nearly optimistic and tends to be smaller when the variance of the perturbation is larger. The second sum measures the likelihood that the sample from armiis close toμ 1 and is small when the variance of the perturbation is small. Balancing these two terms corresponds to optimizing the exploration/exploitation tradeoff.

Proof of Theorem 36.2 LetFt=σ(A 1 ,X 1 ,...,At,Xt) andEi(t) ={θi(t)≤ μ 1 −ε}. By definition

P(θ 1 (t)≥μ 1 −ε|Ft− 1 ) =G 1 T 1 (t−1) a.s.

Bandit Algorithms

(

1

− 1

)]

+E

]

, (36.3)

Get our desktop app

Company

Features

Documentation

Resources