Bandit Algorithms

29.4 Contextual linear bandits 332

The best way to think about the standard adversarial linear model is that it generalizes the stochastic linear bandit with parameter noise. Linear bandits with parameter noise are sometimes easier than the standard model because parameter noise limits the adversary’s control of the signal-to-noise ratio experienced by the learner.

29.4 Contextual linear bandits

In practical applications the action set is usually changing from round to round. Although it is possible to prove bounds for adversarial linear bandits with changing action sets, the notion of regret makes the results less meaningful than what one obtains in the stochastic setting. Suppose that (At)nt=1are a sequence of action sets. In the stochastic setting, the actions (At)tselected by the LinUCB algorithm satisfy

E

[n ∑

t=1

〈At−a∗t,θ〉

]

=O ̃(d

√

n),

wherea∗t=argmaxa∈At〈a,θ〉is the optimal action in roundt. This definition of the regret really measures the right thing: The actiona∗treally is the optimal action in roundt. The analogous result for adversarial bandits would be a bound on

Rn(Θ) = maxθ∈ΘE

[n ∑

t=1

〈At−at(θ),yt〉

]

, (29.4)

where Θ is a subset ofRdandat(θ) =argmaxa∈At〈a,θ〉. Unfortunately, however, we do not currently know how to design algorithms for which this regret is small. For finite Θ the techniques of Chapter 27 are easily adapted to prove a bound of O(

√

dnlog|Θ|), but this algorithm is(a)not computationally efficient for large |Θ|and(b)choosing Θ as anε-covering of a continuous set does not guarantee a bound against the larger set. Providing a meaningful bound on Eq. (29.4) when Θ is a continuous set like{θ:‖θ 2 ‖ ≤ 1 }is a fascinating challenge. The reader may recall that the result in Exercise 27.5 provides a bound for adversarial linear bandits with changing action sets. However, in this problem the actions have ‘identities’ and the regret is measured with respect to the best action in hindsight, which is a markedly different objective than the one in Eq. (29.4).

29.5 Notes

1 For the reduction in Section 29.2 we assumed that|Yt|≤1 almost surely. This is not true for many classical noise models like the Gaussian. One way to

Bandit Algorithms

29.4 Contextual linear bandits

E

]

√

]

, (29.4)

√

29.5 Notes

Get our desktop app

Company

Features

Documentation

Resources