Bandit Algorithms

(Jeff_L) #1
29.4 Contextual linear bandits 332

The best way to think about the standard adversarial linear model is that it
generalizes the stochastic linear bandit with parameter noise. Linear bandits
with parameter noise are sometimes easier than the standard model because
parameter noise limits the adversary’s control of the signal-to-noise ratio
experienced by the learner.

29.4 Contextual linear bandits


In practical applications the action set is usually changing from round to round.
Although it is possible to prove bounds for adversarial linear bandits with changing
action sets, the notion of regret makes the results less meaningful than what
one obtains in the stochastic setting. Suppose that (At)nt=1are a sequence of
action sets. In the stochastic setting, the actions (At)tselected by the LinUCB
algorithm satisfy

E


[n

t=1

〈At−a∗t,θ〉

]


=O ̃(d


n),

wherea∗t=argmaxa∈At〈a,θ〉is the optimal action in roundt. This definition of
the regret really measures the right thing: The actiona∗treally is the optimal
action in roundt. The analogous result for adversarial bandits would be a bound
on

Rn(Θ) = maxθ∈ΘE

[n

t=1

〈At−at(θ),yt〉

]


, (29.4)


where Θ is a subset ofRdandat(θ) =argmaxa∈At〈a,θ〉. Unfortunately, however,
we do not currently know how to design algorithms for which this regret is small.
For finite Θ the techniques of Chapter 27 are easily adapted to prove a bound of
O(


dnlog|Θ|), but this algorithm is(a)not computationally efficient for large
|Θ|and(b)choosing Θ as anε-covering of a continuous set does not guarantee a
bound against the larger set. Providing a meaningful bound on Eq. (29.4) when
Θ is a continuous set like{θ:‖θ 2 ‖ ≤ 1 }is a fascinating challenge. The reader
may recall that the result in Exercise 27.5 provides a bound for adversarial linear
bandits with changing action sets. However, in this problem the actions have
‘identities’ and the regret is measured with respect to the best action in hindsight,
which is a markedly different objective than the one in Eq. (29.4).

29.5 Notes


1 For the reduction in Section 29.2 we assumed that|Yt|≤1 almost surely. This
is not true for many classical noise models like the Gaussian. One way to
Free download pdf