Bandit Algorithms

(Jeff_L) #1
37.6 Upper bound for easy games 464

Ca

Cb C ̃a

v

u

w

Figure 37.4The construction used in the proof of Lemma 37.17.

Suppose that Algorithm 25 is run on local ly observableG= (L,Φ)with parameters

η=^1
V


log(k/δ)
nk

and γ=V kη and δ∈(0,1).

Then with probability at least 1 −δthe regret is bounded byRˆn≤CG


nlog(e/δ)),
whereCGis a constant depending onG, but notnorδ.

By choosingδ= 1/nthe following corollary is obtained.

Corollary 37.16. Suppose that Algorithm 25 is run on locally observable
G= (L,Φ)with the same choices ofηandγas Theorem 37.15 andδ= 1/n,
then there exists a constantCG′ depending onG, but notnsuch that

Rn≤CG′


nlog(n).

37.6 Upper bound for easy games


In this section we prove Theorem 37.15. The first step is a simple lemma showing
that the regret can be localised to the neighborhood of the played action.

Lemma37.17.There exists a constantεG> 0 depending only onGsuch that
for all pairs of actionsa,a ̃∈ Aandu∈C ̃athere exists an actionb∈ Na∩A
such that〈`a−`a ̃,u〉≤〈`a−`b,u〉/εG.

Proof Sinceu∈C ̃a, 0≤〈`a−`a ̃,u〉. The result is trivial ifaand ̃aare neighbors
or〈`a−`a ̃,u〉= 0. From now on assume that〈`a−` ̃a,u〉>0 and thata,a ̃are
not neighbors. Letvbe the centroid ofCa. The idea is to chooseb∈Na∩Aas
that neighbor ofawhose cell is the one that the line segment that connectsv
anduenters when leavingCa. To be precise, ifwlies in the intersection of the
line segment connectingvanduand the boundary ofCathenbis a neighbor of
ainAso thatw∈Ca∩Cb. Note thatwis well defined by the Jordan–Brouwer
separation theorem (see the notes at the end of the chapter), andbis well defined
becauseAis a maximal duplicate-free subset of the Pareto optimal actions. Using
Free download pdf