Bandit Algorithms

(Jeff_L) #1
14.5 Exercises 186

(b) Show that the Radon-NykodimdP/dμexists and thatdP/dμ(ω) =p(ω).


14.7(Relative entropy for Gaussian distributions) For eachi∈{ 1 , 2 }
letμi∈R,σ^2 i>0 andPi=N(μi,σi^2 ). Show that


D(P 1 ,P 2 ) =


1


2


(


log

(


σ 22
σ 12

)


+


σ^21
σ^22

− 1


)


+


(μ 1 −μ 2 )^2
2 σ 22

.


14.8Letλbe the Lebesgue measure on (R,B(R)). Find:


(a)a probability measure (R,B(R)) that is not absolutely continuous with respect
toλ.


(b)a probability measurePon (R,B(R)) that is absolutely continuous toλwith
D(P,Q) =∞whereQ=N(0,1) is the standard Gaussian measure.


14.9(Data processing inequality) LetPandQbe measures on (Ω,F) and
letGbe a sub-σ-algebra ofFandPGandQGbe the restrictions ofPandQto
(Ω,G). Show that D(PG,QG)≤D(P,Q).


14.10Let (Ω,F) be a measurable space andP,Q:B(R)×Ω→[0,1] be a pair
of probability kernels from (Ω,F) to (R,B(R)). Prove that


V={ω∈Ω : D(P(·|ω),Q(·|ω)) =∞}∈F.

Hint Apply Dobrushin’s theorem to the field of finite unions of rational-valued
intervals inR.

14.11(Chain rule) LetPandQbe measures on (Rn,B(Rn)) and fort∈[n]
letXt(x) =xtbe the coordinate project fromRn→R. Then letPtandQtbe
regular versions ofXtgivenX 1 ,...,Xt− 1 underPandQrespectively. Show that


D(P,Q) =


∑n

t=1

EP[D(Pt(·|X 1 ,...,Xt− 1 ),Qt(·|X 1 ,...,Xt− 1 ))]. (14.16)

Hint This is a rather technical exercise. You will likely need to apply a
monotone class argument [Kallenberg, 2002, Theorem 1.1]. For the definition
of a regular version see [Kallenberg, 2002, Theorem 5.3] or Theorem 3.11.
Briefly,Ptis a probability kernel from (Rt−^1 ,B(Rt−^1 )) to (R,B(R)) such that
Pt(A|x 1 ,...,xt− 1 ) =P(Xt∈A|X 1 ,...,Xt− 1 ) withP-probability one for all
A∈B(R).

14.12(Chain rule (cont.)) LetPandQbe measures on (Rn,B(Rn)) and
fort∈[n] letXt(x) =xtbe the coordinate project fromRn→R. Then letPt
andQtbe regular versions ofXtgivenX 1 ,...,Xt− 1 underPandQrespectively.
Letτbe a stopping time adapted to the filtration generated byX 1 ,...,Xnwith

Free download pdf