19.3 Regret analysis 232
The proof of Theorem 19.2 depends on the following lemma.
Lemma19.4.LetV 0 be positive definite andx 1 ,...,xn∈Rdbe a sequence of
vectors with‖xt‖ 2 ≤L <∞for allt∈[n]. Then
∑n
t=1
(
1 ∧‖xt‖^2 Vt−− (^11)
)
≤2 log
(
detVn
detV 0
)
≤ 2 dlog
(
traceV 0 +nL^2
ddet^1 /d(V 0 )
)
.
Proof Using that for anyu∈[0,1],u∧ 1 ≤2 ln(1 +u), we get
∑n
t=1
(
1 ∧‖xt‖^2 Vt−− (^11)
)
≤ 2
∑
t
log
(
1 +‖xt‖^2 Vt−− (^11)
)
.
We now argue that this last expression is log
(
detVn
detV 0
)
. Fort≥1 we have
Vt=Vt− 1 +xtx>t =Vt^1 −/ 12 (I+Vt−−^11 /^2 xtx>tVt−−^11 /^2 )Vt^1 −/ 12.
Hence
det(Vt) = det(Vt− 1 ) det
(
I+Vt−−^11 /^2 xtx>tVt−−^11 /^2
)
= det(Vt− 1 )
(
1 +‖xt‖^2 V− 1
t− 1
)
,
where the second equality follows because the matrixI+yy>has eigenvalues
1 +‖y‖^22 and 1 as well as the fact that the determinant of a matrix is the product
of its eigenvalues. Putting things together we see that
det(Vn) = det(V 0 )
∏n
t=1
(
1 +‖xt‖^2 Vt−− (^11)
)
,
which is equivalent to the first inequality that we wanted to prove. To get the
second inequality note that by the inequality of arithmetic and geometric means,
det(Vn) =
∏d
i=1
λi≤
(
1
d
traceVn
)d
≤
(
traceV 0 +nL^2
d
)d
,
whereλ 1 ,...,λdare the eigenvalues ofVn.
Proof of Theorem 19.2 By part (c) of Assumption 19.1 it suffices to prove the
bound on the event thatθ∗∈Ctfor all roundst∈[n]. LetA∗t=argmaxa∈At〈a,θ∗〉
be an optimal action for roundtandrtbe the instantaneous regret in roundt
defined by
rt=〈A∗t−At,θ∗〉.
Letθ ̃t∈Ctbe the parameter in the confidence set for which〈At,θ ̃t〉=UCBt(At).
Then using the fact thatθ∗∈Ctand the definition of the algorithm leads to
〈A∗t,θ∗〉≤UCBt(A∗t)≤UCBt(At) =〈At,θ ̃t〉.
Using Cauchy-Schwarz inequality and the assumption thatθ∗∈Ctand facts that
θ ̃t∈CtandCt⊆Etleads to
rt=〈At∗−At,θ∗〉≤〈At,θ ̃t−θ∗〉≤‖At‖Vt−− 11 ‖θ ̃t−θ∗‖Vt− 1 ≤ 2 ‖At‖Vt−− (^11)
√
βt.