Bandit Algorithms

(Jeff_L) #1
36.3 Linear bandits 436

O(d



nlog(n) log(n/d)), which matches the upper bound obtained for Lin-UCB
in Corollary 19.3.


Proof We apply the same technique as used in the proof of Theorem 36.1. Define
the upper confidence bound functionUt:A→Rby


Ut(a) =〈a,θˆt− 1 〉+β‖a‖Vt−− 11 , whereVt=^1
S^2

I+


∑t

s=1

AsA>s.

By Theorem 20.5 and Eq. (20.9),P(existst≤n:‖θˆt− 1 −θ‖Vt− 1 > β)≤ 1 /n. Let
Etbe the event that‖θˆt− 1 −θ‖Vt− 1 ≤β,E=


⋂n
t=1EtandA

∗=argmaxa∈A〈a,θ〉.

Note thatA∗is a random variable becauseθis random. Then


BRn=E

[n

t=1

〈A∗−At,θ〉

]


=E


[


IEc

∑n

t=1

〈A∗−At,θ〉

]


+E


[


IE


∑n

t=1

〈A∗−At,θ〉

]


≤2 +E


[


IE


∑n

t=1

〈A∗−At,θ〉

]


≤2 +E


[n

t=1

IEt〈A∗−At,θ〉

]


. (36.7)


Let Ft = σ(A 1 ,X 1 ,...,At,Xt) and let Et− 1 [·] = E[·|Ft− 1 ]. As before,
P(A∗=·|Ft− 1 )=P(At=·|Ft− 1 ), andUt(a), for any fixeda∈ A, isFt− 1 -
measurable and soEt− 1 (Ut(A∗)) =Et− 1 (Ut(At)). It follows that the second term
in the above display is bounded by


Et− 1 [IEt〈A∗−At,θ〉] =IEtEt− 1 [〈A∗,θ〉−Ut(A∗) +Ut(At)−〈At,θ〉]
≤IEtEt− 1 [Ut(At)−〈At,θ〉]
≤IEtEt− 1

[


〈At,θˆt− 1 −θ〉

]


+β‖At‖Vt− 1

≤IEtEt− 1

[


‖At‖Vt− 1 ‖θˆt− 1 −θ‖Vt

]


+β‖At‖Vt− 1

≤ 2 β‖At‖Vt− 1.

Substituting this combined withIEt〈A∗−At,θ〉 ≤2 into the second term of
Eq. (36.7), we get


E


[n

t=1

IEt〈A∗−At,θ〉

]


≤ 2 βE

[n

t=1

(1∧‖At‖Vt−^1 )

]


≤ 2


√√


√√


nβ^2 E

[n

t=1

(1∧‖At‖^2 V− 1
t

)


]


(Cauchy-Schwarz)

≤ 2



2 dnβ^2 E

[


log

(


1 +


nS^2 L^2
d

)]


. (Lemma 19.4)

Free download pdf