Bandit Algorithms

(Jeff_L) #1
Confidence Bounds for Least Squares Estimators 239

Comparingθ∗andθˆtin the directionx∈Rdwe have

〈x,θˆt−θ∗〉=


x,Vt−^1

∑t

s=1

AsXs−θ∗


=



x,Vt−^1

∑t

s=1

As

(


a>sθ∗+ηs

)


−θ∗


=



x,Vt−^1

∑t

s=1

Asηs


=


∑t

s=1


x,Vt−^1 As


ηs.

Since (ηs)sare independent and 1-subgaussian, by Lemma 5.4 and Theorem 5.3,

P





x,θˆt−θ∗



√√


√√


2


∑t

s=1


x,Vt−^1 As

〉 2


log

(


1


δ

)


≤δ.

A little linear algebra shows that


∑t
s=1


x,Vt−^1 As

〉 2


=‖x‖^2 Vt− 1 and so

P


(


〈x,θˆt−θ∗〉≥


2 ‖x‖^2 V− 1
t

log

(


1


δ

))


≤δ. (20.2)

The next step is to convert the above bound on〈x,θˆt−θ∗〉to a bound on
‖θˆt−θ∗‖Vt. To begin this process notice that


‖θˆt−θ∗‖Vt=〈Vt^1 /^2 X,θˆt−θ∗〉,whereX=

Vt^1 /^2 (θˆt−θ∗)
‖θˆt−θ∗‖Vt

.


The problem is that X is random while we have only proven (20.2) for
deterministicx. The standard way of addressing problems like this is to use
acovering argument. First we identify a finite setCε⊂Rdsuch that whatever
valueXtakes, there exists somex∈ Cεthat isε-close toX. Then a union
bound and a triangle inequality allows one to finish. By its definition we have
‖X‖^22 =X>X= 1, which means thatX∈Sd−^1 ={x∈Rd:‖x‖ 2 = 1}. Using
thatX∈Sd−^1 we see it suffices to coverSd−^1. The following lemma provides
the necessary guarantees on the size of the covering set.


Lemma20.1. There exists a setCε⊂Rdwith|Cε| ≤(3/ε)dsuch that for all
x∈Sd−^1 there exists ay∈Cεwith‖x−y‖ 2 ≤ε.

The proof of this lemma requires a bit work, but nothing really deep is needed.
This work is deferred to Exercises 20.3 and 20.4. LetCεbe the covering set given
by the lemma and define event


E=


{


existsx∈Cε:


Vt^1 /^2 x,ˆθt−θ∗




2 log

(


|Cε|
δ

)}


.


Using the fact that‖Vt^1 /^2 x‖Vt− 1 =‖x‖ 2 = 1 and a union bound combined with
Eq. (20.2) shows thatP(E)≤δ. WhenEdoes not occur Cauchy-Schwarz shows
Free download pdf