36.3 Linear bandits 436O(d
√
nlog(n) log(n/d)), which matches the upper bound obtained for Lin-UCB
in Corollary 19.3.
Proof We apply the same technique as used in the proof of Theorem 36.1. Define
the upper confidence bound functionUt:A→Rby
Ut(a) =〈a,θˆt− 1 〉+β‖a‖Vt−− 11 , whereVt=^1
S^2I+
∑ts=1AsA>s.By Theorem 20.5 and Eq. (20.9),P(existst≤n:‖θˆt− 1 −θ‖Vt− 1 > β)≤ 1 /n. Let
Etbe the event that‖θˆt− 1 −θ‖Vt− 1 ≤β,E=
⋂n
t=1EtandA∗=argmaxa∈A〈a,θ〉.Note thatA∗is a random variable becauseθis random. Then
BRn=E[n
∑t=1〈A∗−At,θ〉]
=E
[
IEc∑nt=1〈A∗−At,θ〉]
+E
[
IE
∑nt=1〈A∗−At,θ〉]
≤2 +E
[
IE
∑nt=1〈A∗−At,θ〉]
≤2 +E
[n
∑t=1IEt〈A∗−At,θ〉]
. (36.7)
Let Ft = σ(A 1 ,X 1 ,...,At,Xt) and let Et− 1 [·] = E[·|Ft− 1 ]. As before,
P(A∗=·|Ft− 1 )=P(At=·|Ft− 1 ), andUt(a), for any fixeda∈ A, isFt− 1 -
measurable and soEt− 1 (Ut(A∗)) =Et− 1 (Ut(At)). It follows that the second term
in the above display is bounded by
Et− 1 [IEt〈A∗−At,θ〉] =IEtEt− 1 [〈A∗,θ〉−Ut(A∗) +Ut(At)−〈At,θ〉]
≤IEtEt− 1 [Ut(At)−〈At,θ〉]
≤IEtEt− 1[
〈At,θˆt− 1 −θ〉]
+β‖At‖Vt− 1≤IEtEt− 1[
‖At‖Vt− 1 ‖θˆt− 1 −θ‖Vt]
+β‖At‖Vt− 1≤ 2 β‖At‖Vt− 1.Substituting this combined withIEt〈A∗−At,θ〉 ≤2 into the second term of
Eq. (36.7), we get
E
[n
∑t=1IEt〈A∗−At,θ〉]
≤ 2 βE[n
∑t=1(1∧‖At‖Vt−^1 )]
≤ 2
√√
√√
nβ^2 E[n
∑t=1(1∧‖At‖^2 V− 1
t)
]
(Cauchy-Schwarz)≤ 2
√
2 dnβ^2 E[
log(
1 +
nS^2 L^2
d)]
. (Lemma 19.4)