Bandit Algorithms

(Jeff_L) #1
36.7 Exercises 446

Hint For (a) you may find it useful to know that fory≥0,

1 −Φ(y)≥

exp(−y^2 /2)
y+


y^2 + 4

,


where Φ(y) =√^12 π


∫y
−∞exp(−x

(^2) /2)dxis the cumulative distribution function of
the standard Gaussian [Abramowitz and Stegun, 1964,§7.1.13].
36.7(Properties of mutual information) Prove Lemmas 36.5 and 36.6.
Hint For Lemma 36.5, you may want to use thatH(X|G) =E


[


HP(·|G)(X)


]


.
For Lemma 36.6, to get a sense of what you need to do, first assume thatY
is also discrete and then generalize your calculations. You may want to recall
Fig. 2.4 and the note next to it at one point in the proof, which ultimately relies
on the factorization lemma (Lemma 2.5).
36.8(Mutual information for independent random variables) Suppose
thatXandYare independent random variables. Show thatI(X;Y) = 0.

The converse of the claim of the previous exercise also holds: IfI(X;Y) = 0
thenX and Y are independent: Mutual information is a measure of
dependence.

36.9(Strengthening Theorem 36.8) Show that Theorem 36.8 continues to
hold even when the condition maxt∈[n]Γt≤Γ is replaced by ̄ n^1

∑n
t=1Γt≤ ̄Γ.
36.10(Information directed sampling) Prove that for any prior such that
Xti∈[0,1] almost surely the Bayesian regret of information-directed sampling
satisfies

BRn≤


knlog(k)
2

.


36.11(From Bayesian to adversarial regret) LetE={ 0 , 1 }n×kandQ
be the space of probability measures onE. Prove that that

R∗n(E) = sup
Q∈Q

BR∗n(Q).

36.12(From Bayesian to adversarial regret) LetE= [0,1]n×k. Prove
that

R∗n(E) = sup
Q∈Q

BR∗n(Q),

whereQis the set of probability measures on (E, 2 E) with finite support.

Free download pdf