Bandit Algorithms

(Jeff_L) #1
38.3 Finding an optimal policy ( ) 485

innonly, provided that the constraints satisfy certain structural properties. Let
K⊂Rnbe convex and consider the optimization problem

minimizex∈Rn 〈c,x〉 (38.8)

subject tox∈K.

Algorithms for this problem generally have a slightly different flavor becauseK
may have no corners. Suppose the following holds:


(a) There exists a knownR >0 such thatK⊂{x∈Rn:‖x‖ 2 ≤R}.
(b)There exists a separation oracle, which we recall from Chapter 27, is a
computational procedure to evaluate some functionφonRnwithφ(x) =true
forx∈ Kand otherwiseφ(x) =uwith〈y,u〉>〈x,u〉for ally∈ K(see
Fig. 27.1).
(c) There exists aδ >0 andx 0 ∈Rdsuch that{x∈Rn:‖x−x 0 ‖ 2 ≤δ}⊂K.


Under these circumstances the ellipsoid method accepts as input the size of the
bounding sphereR, the separation oracle and an accuracy parameterε >0. Its
output is a pointxin time polynomial innandlog(R/(δε)) such thatx∈Kand
〈c,x〉≤〈c,x∗〉+εwherex∗is the minimizer of Eq. (38.8). The reader can find
references to this method at the end of the chapter.
The linear programs in Eq. (38.6) and Eq. (38.7) do not have bounded feasible
regions because ifvis feasible, thenv+c 1 is also feasible for anyc∈R. For
strongly connected MDPs with diameterD, however, Lemma 38.3 allows us to
add the constraint that‖v‖∞≤D. If the rewards are bounded in [0,1], then we
may also add the constraint that 0≤ρ≤1. Together these imply that for (ν,ρ)
in the feasible region,

‖(ρ,v)‖^22 =ρ^2 +‖v‖^22 ≤1 +S‖v‖^2 ∞≤1 +SD^2.

Then setR=



1 +D^2 S. When the diameter is unknown one may use a doubling
procedure. In order to guarantee the feasible region contains a small ball we
add some slack to the constraints. Letε >0 and consider the following linear
program.

minimize
ρ∈R,v∈RS

ρ (38.9)

subject to ε+ρ+v(s)≥ra(s) +〈Pa(s),v〉for alls,a.
v(s)≥−Dfor alls
v(s)≤Dfor alls
ρ≤1 +εfor alls
ρ≥−εfor alls.

Note that for anyxin the feasible region of Eq. (38.9) there exists aythat is
feasible for Eq. (38.6) with‖x−y‖∞≤ε. Furthermore, the solution to the above
linear program is at mostεaway from the solution to Eq. (38.6). What we have
Free download pdf