Bandit Algorithms

(Jeff_L) #1
This material will be published by Cambridge University Press as Bandit
Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version
is free to view and download for personal use only. Not for re-distribution, sale,
or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017.
The latest version is available athttp://banditalgs.com. Feedback on any
aspect is very welcome: [email protected]

21 Optimal Design for Least Squares Estimators


In the preceding chapter we showed how to construct confidence intervals for
least squares estimators when the design is chosen sequentially. We now study
the problem of choosing actions for which the resulting confidence sets are small.
This plays an important role in the analysis of stochastic linear bandits with
finitely many arms (Chapter 22) and adversarial linear bandits (Part VI).

21.1 The Kiefer–Wolfowitz theorem


Letη 1 ,...,ηnbe a sequence of independent 1-subgaussian random variables and
a 1 ,...,an∈Rdbe a fixed sequence withspan(a 1 ,...,an) =RdandX 1 ,...,Xn
be given byXt=〈at,θ∗〉+ηtfor someθ∗∈Rd. The least squares estimator of
θ∗isθˆ=V−^1

∑n
t=1atXtwithV=

∑n
t=1ata

>t.

The least squares estimator used here is not regularized. This eases the
calculations and the lack of regularization will not harm us in future
applications.

Eq. (20.2) from Chapter 20 shows that for anya∈Rdandδ∈(0,1),

P


(


〈θˆ−θ∗,a〉≥


2 ‖a‖^2 V− 1 log

(


1


δ

))


≤δ. (21.1)

For our purposes, botha 1 ,...,anandawill be actions from some (possibly
infinite) setA⊂Rdand the question of interest is finding the shortest sequence
of exploratory actionsa 1 ,...,ansuch that the confidence bound in the previous
display is smaller than some threshold for alla∈ A. To solve this exactly
is likely an intractable exercise in integer programming. Finding an accurate
approximation turns out to be efficient for a broad class of action sets, however.
Letπ:A→[0,1] be a distribution onAso that


a∈Aπ(a) = 1 andV(π)∈R

d×d
andg(π)∈Rbe given by

V(π) =


a∈A

π(a)aa>, g(π) = max
a∈A

‖a‖^2 V(π)− 1. (21.2)
Free download pdf