Bandit Algorithms

(Jeff_L) #1
6.4 Exercises 92

is


2 /m-subgaussian. You should only use the definitions and the interaction
protocol, which states that

(a)P(At∈·|A 1 ,X 1 ,...,At− 1 ,Xt− 1 ) =π(·|A 1 ,X 1 ,...,At− 1 ,Xt− 1 ) a.s.
(b)P(Xt∈·|A 1 ,X 1 ,...,At− 1 ,Xt− 1 ,At) =PAt(·) a.s.


6.2(Minimax regret) Show that Eq. (6.6) implies the regret of an optimally
tuned ETC for subgaussian 2-armed bandits satisfiesRn≤∆ +C


nwhere
C >0 is a universal constant.

6.3(High probability bounds i) Fixδ∈(0,1). Modify the ETC algorithm to
depend onδand prove a bound on the pseudo-regretR ̄n=nμ∗−

∑n
t=1μAtof
ETC that holds with probability 1−δ.

6.4(High probability bounds ii) Fixδ∈(0,1). Prove a bound on the random
regretRˆn=nμ∗−

∑n
t=1Xtof ETC that holds with probability 1−δ. Compare
this to the bound derived for the pseudo-regret in the previous exercise. What
can you conclude?

6.5(Adaptive commitment times) Suppose that ETC interacts with a
2-armed 1-subgaussian bandit with meansμ 1 ,μ 2 ∈Rand ∆ =|μ 1 −μ 2 |.

(a)Find a choice ofmthat only depends on the horizonnand not ∆ such the
regret of Algorithm 1 is bounded by


Rn≤∆ +Cn^2 /^3 ,

whereC >0 is a universal constant.

(b)Now suppose the commitment time is allowed to be data-dependent, which
means the algorithm explores each arm alternately until some condition is
met and then commits to a single arm for the remainder. Design a condition
such that the regret of the resulting algorithm can be bounded by


Rn≤∆ +

Clogn

, (6.7)


whereCis a universal constant. Your condition should only depend on the
observed rewards and the time horizon. It should not depend onμ 1 ,μ 2 or ∆.
(c)Show that any algorithm for which(6.7)holds also satisfiesRn ≤∆ +
C


nlog(n) for suitably chosen universal constantC >0.
(d)As for (b), but now the objective is to design a condition such that the regret
of the resulting algorithm is bounded by


Rn≤∆ +

Clog max

{


e,n∆^2

}



, (6.8)


(e)Show that any algorithm for which(6.8)holds also satisfiesRn≤∆ +C


n
for suitably chosen universal constantC >0.
Free download pdf