MDP model - rewards
R(s,a) = reward at state s
for doing action a
(a random variable).
Example:
R(s,a) = -1 with probability 0.5
+10 with probability 0.35
+20 with probability 0.15
Previous slide
Next slide
Back to first slide
View graphic version