36.3 Linear bandits 436
O(d
√
nlog(n) log(n/d)), which matches the upper bound obtained for Lin-UCB
in Corollary 19.3.
Proof We apply the same technique as used in the proof of Theorem 36.1. Define
the upper confidence bound functionUt:A→Rby
Ut(a) =〈a,θˆt− 1 〉+β‖a‖Vt−− 11 , whereVt=^1
S^2
I+
∑t
s=1
AsA>s.
By Theorem 20.5 and Eq. (20.9),P(existst≤n:‖θˆt− 1 −θ‖Vt− 1 > β)≤ 1 /n. Let
Etbe the event that‖θˆt− 1 −θ‖Vt− 1 ≤β,E=
⋂n
t=1EtandA
∗=argmaxa∈A〈a,θ〉.
Note thatA∗is a random variable becauseθis random. Then
BRn=E
[n
∑
t=1
〈A∗−At,θ〉
]
=E
[
IEc
∑n
t=1
〈A∗−At,θ〉
]
+E
[
IE
∑n
t=1
〈A∗−At,θ〉
]
≤2 +E
[
IE
∑n
t=1
〈A∗−At,θ〉
]
≤2 +E
[n
∑
t=1
IEt〈A∗−At,θ〉
]
. (36.7)
Let Ft = σ(A 1 ,X 1 ,...,At,Xt) and let Et− 1 [·] = E[·|Ft− 1 ]. As before,
P(A∗=·|Ft− 1 )=P(At=·|Ft− 1 ), andUt(a), for any fixeda∈ A, isFt− 1 -
measurable and soEt− 1 (Ut(A∗)) =Et− 1 (Ut(At)). It follows that the second term
in the above display is bounded by
Et− 1 [IEt〈A∗−At,θ〉] =IEtEt− 1 [〈A∗,θ〉−Ut(A∗) +Ut(At)−〈At,θ〉]
≤IEtEt− 1 [Ut(At)−〈At,θ〉]
≤IEtEt− 1
[
〈At,θˆt− 1 −θ〉
]
+β‖At‖Vt− 1
≤IEtEt− 1
[
‖At‖Vt− 1 ‖θˆt− 1 −θ‖Vt
]
+β‖At‖Vt− 1
≤ 2 β‖At‖Vt− 1.
Substituting this combined withIEt〈A∗−At,θ〉 ≤2 into the second term of
Eq. (36.7), we get
E
[n
∑
t=1
IEt〈A∗−At,θ〉
]
≤ 2 βE
[n
∑
t=1
(1∧‖At‖Vt−^1 )
]
≤ 2
√√
√√
nβ^2 E
[n
∑
t=1
(1∧‖At‖^2 V− 1
t
)
]
(Cauchy-Schwarz)
≤ 2
√
2 dnβ^2 E
[
log
(
1 +
nS^2 L^2
d
)]
. (Lemma 19.4)