Bandit Algorithms

(Jeff_L) #1
36.2 Frequentist analysis 433

We start with a straightforward decomposition:


E[Ti(n)] =E

[n

t=1

I{At=i}

]


=E


[n

t=1

I{At=i,Ei(t)}

]


+E


[n

t=1

I{At=i,Eic(t)}

]


. (36.4)


In order to bound the first term letA′t= argmaxi 6 =1θi(t). Then

P(At= 1,Ei(t)|Ft− 1 )≥P(A′t=i,Ei(t),θ 1 (t)≥μ 1 −ε|Ft− 1 )
=P(θ 1 (t)≥μ 1 −ε|Ft− 1 )P(A′t=i,Ei(t)|Ft− 1 )


G 1 T 1 (t−1)
1 −G 1 T 1 (t−1)

P(At=i,Ei(t)|Ft− 1 ), (36.5)

where in the first equality we used the fact thatθ 1 (t) is conditionally independent
ofA′tandEi(t) givenFt− 1. In the second inequality we used the definition of
G 1 sand the fact that


P(At=i,Ei(t)|Ft− 1 )≤(1−P(θ 1 (t)≥μ 1 −ε|Ft− 1 ))P(A′t=i,Ei(t)|Ft− 1 ),

which is true since{At=i,Ei(t)occurs} ⊆ {A′t=i,Ei(t)occurs}∩{θ 1 (t)≤
μ 1 −ε}and the two intersected events are conditionally independent givenFt− 1.
Therefore using Eq. (36.5) we have


P(At=i,Ei(t)|Ft− 1 )≤

(


1


G 1 T 1 (t−1)

− 1


)


P(At= 1,Ei(t)|Ft− 1 )


(


1


G 1 T 1 (t−1)

− 1


)


P(At= 1|Ft− 1 ).

Substituting this into the first term in Eq. (36.4) leads to

E


[n

t=1

I{At=i,Ei(t) occurs}

]


≤E


[n

t=1

(


1


G 1 T 1 (t−1)−^1

)


P(At= 1|Ft− 1 )

]


=E


[n

t=1

(


1


G 1 T 1 (t−1)−^1

)


I{At= 1}

]


≤E


[n− 1

s=0

(


1


G 1 s

− 1


)]


, (36.6)


where in the last step we used the fact thatT 1 (t−1) =sis only possible for one
round whereAt= 1. LetT ={t∈[n] : 1−FiTi(t−1)(μ 1 −ε)> 1 /n}. After some

Free download pdf