Confidence Bounds for Least Squares Estimators 239
Comparingθ∗andθˆtin the directionx∈Rdwe have
〈x,θˆt−θ∗〉=
〈
x,Vt−^1
∑t
s=1
AsXs−θ∗
〉
=
〈
x,Vt−^1
∑t
s=1
As
(
a>sθ∗+ηs
)
−θ∗
〉
=
〈
x,Vt−^1
∑t
s=1
Asηs
〉
=
∑t
s=1
〈
x,Vt−^1 As
〉
ηs.
Since (ηs)sare independent and 1-subgaussian, by Lemma 5.4 and Theorem 5.3,
P
〈
x,θˆt−θ∗
〉
≥
√√
√√
2
∑t
s=1
〈
x,Vt−^1 As
〉 2
log
(
1
δ
)
≤δ.
A little linear algebra shows that
∑t
s=1
〈
x,Vt−^1 As
〉 2
=‖x‖^2 Vt− 1 and so
P
(
〈x,θˆt−θ∗〉≥
√
2 ‖x‖^2 V− 1
t
log
(
1
δ
))
≤δ. (20.2)
The next step is to convert the above bound on〈x,θˆt−θ∗〉to a bound on
‖θˆt−θ∗‖Vt. To begin this process notice that
‖θˆt−θ∗‖Vt=〈Vt^1 /^2 X,θˆt−θ∗〉,whereX=
Vt^1 /^2 (θˆt−θ∗)
‖θˆt−θ∗‖Vt
.
The problem is that X is random while we have only proven (20.2) for
deterministicx. The standard way of addressing problems like this is to use
acovering argument. First we identify a finite setCε⊂Rdsuch that whatever
valueXtakes, there exists somex∈ Cεthat isε-close toX. Then a union
bound and a triangle inequality allows one to finish. By its definition we have
‖X‖^22 =X>X= 1, which means thatX∈Sd−^1 ={x∈Rd:‖x‖ 2 = 1}. Using
thatX∈Sd−^1 we see it suffices to coverSd−^1. The following lemma provides
the necessary guarantees on the size of the covering set.
Lemma20.1. There exists a setCε⊂Rdwith|Cε| ≤(3/ε)dsuch that for all
x∈Sd−^1 there exists ay∈Cεwith‖x−y‖ 2 ≤ε.
The proof of this lemma requires a bit work, but nothing really deep is needed.
This work is deferred to Exercises 20.3 and 20.4. LetCεbe the covering set given
by the lemma and define event
E=
{
existsx∈Cε:
〈
Vt^1 /^2 x,ˆθt−θ∗
〉
≥
√
2 log
(
|Cε|
δ
)}
.
Using the fact that‖Vt^1 /^2 x‖Vt− 1 =‖x‖ 2 = 1 and a union bound combined with
Eq. (20.2) shows thatP(E)≤δ. WhenEdoes not occur Cauchy-Schwarz shows