Bandit Algorithms

36.1 Finite-armed bandits 430

36.1 Finite-armed bandits

Recalling the notation from Section 34.5, letk >1 and (E,B(E),Q,P) be a k-armed Bayesian bandit environment. The learner chooses actions (At)nt=1and receives rewards (Xt)nt=1and the posterior aftertobservations is a probability kernelQ(·|·) from ([k]×R)tto (E,B(E)). Denote the mean of theith arm in banditν∈Ebyμi(ν) =

∫

RxdPνi(x). In roundt, Thompson sampling samples a bandit environmentνtfrom the posterior ofQgivenA 1 ,X 1 ,...,At− 1 ,Xt− 1 and then chooses the arm with the largest mean (Algorithm 22). We assume that ties are resolved in an arbitrary, but systematic fashion.

1:Input Bayesian bandit environment (E,B(E),Q,P) 2:fort= 1, 2 ,...,ndo 3: Sampleνt∼Q(·|A 1 ,X 1 ,...,At− 1 ,Xt− 1 ) 4: ChooseAt= argmaxi∈[k]μi(νt) 5:end for Algorithm 22:Thompson sampling.

Thompson sampling has been analyzed in both the frequentist and the Bayesian settings. We start with the latter where the result requires almost no assumptions on the prior. In fact, after one small observation about Thompson sampling, the analysis is almost the same as that of UCB.

Theorem36.1.Let(E,B(E),Q,P)be ak-armed Bayesian bandit environment such that for al lν∈Eandi∈[k]the distributionPνiis 1 -subgaussian with mean in[0,1]. Then the policyπof Thompson sampling satisfies

BRn(π,Q)≤C

√

knlog(n),

whereC > 0 is a universal constant. Proof Abbreviateμi=μi(ν) and letA∗=argmaxi∈[k]μibe the optimal arm, which depends onνand is a random variable. When there are ties, we use the same tie-breaking rule as in the algorithm in the definition ofA∗. For eacht∈[n] andi∈[k] let

Ut(i) = clip[0,1]

(

μˆi(t−1) +

√

2 log(1/δ) 1 ∨Ti(t−1)

)

,

whereˆμi(t−1) is the empirical estimate of the reward of armiaftert−1 rounds and we assumeμˆi(t−1) = 0 ifTi(t−1) = 0. LetEbe the event that for all t∈[n] andi∈[k],

|ˆμi(t−1)−μi|<

√

2 log(1/δ) 1 ∨Ti(t−1)

.

In Exercise 36.2 we ask you to prove that P(Ec) ≤ nkδ. Let Ft =

Bandit Algorithms

36.1 Finite-armed bandits

∫

√

(

√

)

,

√

.

Get our desktop app

Company

Features

Documentation

Resources