Bandit Algorithms

24.4 Unrealizable case 276

Next define matrixV∈Rp×dto be a block-diagonal matrix with 1×kblocks, each containing the row vector (1, 2 ,...,k)>. For example, whenp= 3, we have

V=





1 ··· k 0 ··· 0 0 ··· 0 0 ··· 0 1 ··· k 0 ··· 0 0 ··· 0 0 ··· 0 1 ··· k



.

LetBt=V At∈[k]prepresent the vector of ‘base’ actions chosen by the learner in each of thepbandits in roundt. The optimal action in theith bandit is b∗i(θ) = argmaxb∈[k]θ(bi).

The regret can be decomposed into the regrets in thep‘base bandit’ problems (a
form of separability, again):

Rn(θ) =

∑p

i=1

∆Eθ

[n ∑

t=1

I{Bti 6 =b∗i}

]

︸︷︷︸

Rni(θ) Fori∈[p] we abbreviateθ(−i)= (θ(1),...,θ(i−1),θ(i+1),...,θ(p)). Then,

1 |Θ|p

∑

θ∈Θp

Rn(θ) =^1 |Θ|p

∑p

i=1

∑

θ∈Θp

Rni(θ)

=

∑p

i=1

1

|Θ|p−^1

∑

θ(−i)∈Θp−^1

1

|Θ|

∑

θ(i)∈Θ

Rni(θ)

≥

1

8

∑p

i=1

1

|Θ|p−^1

∑

θ(−i)∈Θp−^1

√

kn (24.6)

=^1

8

p

√

kn=^1 8

√

dpn.

Here, in the first equality we use the convention thatθdenotes the vector obtained by ‘inserting’θ(i)intoθ(−i)at theith ‘block’. Other than this, the only tricky step is the inequality, which follows by choosing ∆≈

√

k/nand repeating the
argument outlined in Exercise 15.2. We leave it to the reader to check the details
(Exercise 24.1).

24.4 Unrealizable case

An important generalization of the linear model is theunrealizablecase where
the mean rewards are not assumed to follow a linear model exactly. Suppose
thatA ⊂Rdis a finite set with|A| =kand thatXt=ηt+μ(At) where
μ:A→Ris an unknown function. Letθ∈Rdbe the parameter vector for which
supa∈A|〈θ,a〉−μ(a)|is as small as possible:
θ= argminθ∈Rdsup
a∈A

|〈θ,a〉−μ(a)|.

Bandit Algorithms

V=







.

]

︸︷︷︸

∑

∑

=

1

∑

1

|Θ|

∑

≥

1

8

1

∑

√

=^1

8

√

√

√

24.4 Unrealizable case

Get our desktop app

Company

Features

Documentation

Resources

Bandit Algorithms

V=







.

]

︸ ︷︷ ︸

∑

∑

=

1

∑

1

|Θ|

∑

≥

1

8

1

∑

√

=^1

8

√

√

√

24.4 Unrealizable case

Get our desktop app

Company

Features

Documentation

Resources

︸︷︷︸