Dynamic programming
Problem
We want to maximize the aggregate reward
T
∑
t= 1
rt(st,at)!max,
under the conditions that
s 1 2 S
st = ft 1 (st 1 ,at 1 ),t= 2 ,.. .,T
at 2 Φ(st)A,t= 1 , 2 ,.. .,T
where atis the action one can choose and stis the state of the system.
Obviously the objects in the problem are given before the optimization and
are parameters of the problem.