Bandit Algorithms

36.2 Frequentist analysis 433

We start with a straightforward decomposition:

E[Ti(n)] =E

[n ∑

t=1

I{At=i}

]

=E

[n ∑

t=1

I{At=i,Ei(t)}

]

+E

[n ∑

t=1

I{At=i,Eic(t)}

]

. (36.4)

In order to bound the first term letA′t= argmaxi 6 =1θi(t). Then

P(At= 1,Ei(t)|Ft− 1 )≥P(A′t=i,Ei(t),θ 1 (t)≥μ 1 −ε|Ft− 1 ) =P(θ 1 (t)≥μ 1 −ε|Ft− 1 )P(A′t=i,Ei(t)|Ft− 1 )

≥

G 1 T 1 (t−1) 1 −G 1 T 1 (t−1)

P(At=i,Ei(t)|Ft− 1 ), (36.5)

where in the first equality we used the fact thatθ 1 (t) is conditionally independent
ofA′tandEi(t) givenFt− 1. In the second inequality we used the definition of
G 1 sand the fact that

P(At=i,Ei(t)|Ft− 1 )≤(1−P(θ 1 (t)≥μ 1 −ε|Ft− 1 ))P(A′t=i,Ei(t)|Ft− 1 ),

which is true since{At=i,Ei(t)occurs} ⊆ {A′t=i,Ei(t)occurs}∩{θ 1 (t)≤
μ 1 −ε}and the two intersected events are conditionally independent givenFt− 1.
Therefore using Eq. (36.5) we have

P(At=i,Ei(t)|Ft− 1 )≤

(

1

G 1 T 1 (t−1)

− 1

)

P(At= 1,Ei(t)|Ft− 1 )

≤

(

1

G 1 T 1 (t−1)

− 1

)

P(At= 1|Ft− 1 ).

Substituting this into the first term in Eq. (36.4) leads to

E

[n ∑

t=1

I{At=i,Ei(t) occurs}

]

≤E

[n ∑

t=1

(

1

G 1 T 1 (t−1)−^1

)

P(At= 1|Ft− 1 )

]

=E

[n ∑

t=1

(

1

G 1 T 1 (t−1)−^1

)

I{At= 1}

]

≤E

[n− 1 ∑

s=0

(

1

G 1 s

− 1

)]

, (36.6)

where in the last step we used the fact thatT 1 (t−1) =sis only possible for one
round whereAt= 1. LetT ={t∈[n] : 1−FiTi(t−1)(μ 1 −ε)> 1 /n}. After some

Bandit Algorithms

]

=E

]

+E

]

. (36.4)

(

1

− 1

)

≤

(

1

− 1

)

E

]

≤E

(

1

)

]

=E

(

1

)

]

≤E

(

1

− 1

)]

, (36.6)

Get our desktop app

Company

Features

Documentation

Resources