Confidence Bounds for Least Squares Estimators 239Comparingθ∗andθˆtin the directionx∈Rdwe have〈x,θˆt−θ∗〉=〈
x,Vt−^1∑ts=1AsXs−θ∗〉
=
〈
x,Vt−^1∑ts=1As(
a>sθ∗+ηs)
−θ∗〉
=
〈
x,Vt−^1∑ts=1Asηs〉
=
∑ts=1〈
x,Vt−^1 As〉
ηs.Since (ηs)sare independent and 1-subgaussian, by Lemma 5.4 and Theorem 5.3,P
〈
x,θˆt−θ∗〉
≥
√√
√√
2
∑ts=1〈
x,Vt−^1 As〉 2
log(
1
δ)
≤δ.A little linear algebra shows that
∑t
s=1〈
x,Vt−^1 As〉 2
=‖x‖^2 Vt− 1 and soP
(
〈x,θˆt−θ∗〉≥√
2 ‖x‖^2 V− 1
tlog(
1
δ))
≤δ. (20.2)The next step is to convert the above bound on〈x,θˆt−θ∗〉to a bound on
‖θˆt−θ∗‖Vt. To begin this process notice that
‖θˆt−θ∗‖Vt=〈Vt^1 /^2 X,θˆt−θ∗〉,whereX=Vt^1 /^2 (θˆt−θ∗)
‖θˆt−θ∗‖Vt.
The problem is that X is random while we have only proven (20.2) for
deterministicx. The standard way of addressing problems like this is to use
acovering argument. First we identify a finite setCε⊂Rdsuch that whatever
valueXtakes, there exists somex∈ Cεthat isε-close toX. Then a union
bound and a triangle inequality allows one to finish. By its definition we have
‖X‖^22 =X>X= 1, which means thatX∈Sd−^1 ={x∈Rd:‖x‖ 2 = 1}. Using
thatX∈Sd−^1 we see it suffices to coverSd−^1. The following lemma provides
the necessary guarantees on the size of the covering set.
Lemma20.1. There exists a setCε⊂Rdwith|Cε| ≤(3/ε)dsuch that for all
x∈Sd−^1 there exists ay∈Cεwith‖x−y‖ 2 ≤ε.The proof of this lemma requires a bit work, but nothing really deep is needed.
This work is deferred to Exercises 20.3 and 20.4. LetCεbe the covering set given
by the lemma and define event
E=
{
existsx∈Cε:〈
Vt^1 /^2 x,ˆθt−θ∗〉
≥
√
2 log(
|Cε|
δ)}
.
Using the fact that‖Vt^1 /^2 x‖Vt− 1 =‖x‖ 2 = 1 and a union bound combined with
Eq. (20.2) shows thatP(E)≤δ. WhenEdoes not occur Cauchy-Schwarz shows