38.10 Exercises 503
distributionμ∈ P(S) show there exists a probability space (Ω,F,P) and an
infinite sequence of random elementsS 1 ,A 1 ,S 2 ,A 2 ,...such that for anys∈S,
a∈Aandt∈N,
(a)P(S 1 =s) =μ(s).
(b)P(St+1=s|S 1 ,A 1 ,...,St,At) =PAt(St,s).
(c)P(At=a|S 1 ,A 1 ,...,St) =πt(a|S 1 ,A 1 ,...,St− 1 ,At− 1 ,St).
Hint Use Theorem 3.3.
38.2(Sufficiency of Markov policies) LetM= (S,A,P) be a finite
controlled Markov environment,πbe an arbitrary policy andμ∈ P(S) an
arbitrary initial state distribution. Denote byPπμthe probability distribution that
results from the interconnection ofπandMwhile the initial state distribution is
μ.
(a) Show there exists a Markov policyπ′such that
Pπμ(St=s,At=a) =Pπ
′
μ(St=a,At=a).
for allt≥1 ands,a∈S×A.
(b)Conclude that for any policyπthere exists a Markov policyπ′such that for
anys∈S, ̄ρπs= ̄ρπ
′
s.
Hint Defineπ′in an inductively starting att= 1. Puterman [Theorem 5.5.1
2009] proves this result and credits Strauch [1966].
38.3(Deterministic policies minimize travel time) LetP be some
transition structure over some finite state spaceSand some finite action space
A. Show that the expected travel time between two statess,s′ofSis minimized
by a deterministic policy.
Hint Letτ∗(s,s′) be the shortest expected travel time between some arbitrary
pairs of states, which fors=s′is defined to be zero. Show thatτ∗satisfies the
fixed point equation
τ∗(s,s′) =
{
0 , ifs=s′;
1 + mina
∑
s′′Pa(s,s
′′)τ∗(s′′,s′), otherwise.
38.4(Strongly connected⇔finite diameter) LetMbe a finite MDP.
Prove thatD(M)<∞is equivalent toMbeing strongly connected.
38.5(Diameter lower bound) LetM= (S,A,P,r) be any MDP. Show that
D(M)≥logA(S)−3.
Hint Denote by d∗(s,s′) the minimum expected time it takes to reach
states′when starting from states. The definition ofd∗can be extended to
arbitrary initial distributionsμ 0 over states and setsU⊂ Sof target states: