Dynamic programming
DeÖnition
(^1) The historyhtat timetis the sequenceht=(s 1 ,a 1 ,.. .at 1 ,st).
(Observe that there is noatbut there is anstin the deÖnition of the
history.)
(^2) Strategyσis a sequence of mappingsσt.which gives the next action
at 2 Φ(st).Theσtcan depend on the whole actual history.ht.
(^3) Σdenotes the set of strategies.
(^4) The value function is
V(s)$σ 2 maxΣ,σ
0 =s
W(σ)$σ 2 maxΣ,σ
0 =s
T
∑
t= 1
rt(σt).
if the maximum is attained otherwise we write sup instead of max.
(^5) One can also deÖneVt(s)asV(s)above just we start the
optimization at time periodt.