next up previous
Next: Policy and Decision Rules Up: Introduction Previous: states and actions

   
Immediate reward and the probability of transition

As a result of performing an action ${a}\in{A_{s}}$ in state $s\in{S}$ at time t:

1. The agent is rewarded an immediate reward Rt(s,a). We define the expectation of Rt as rt(s,a)=E[Rt(s,a)].

2.The system transfers to a new state s', determined according to transition probability Pt(s'|s,a). We assume that Pt is well defined, that is, that for every $s\in{S}$ and ${a}\in{A_{s}}$, $\sum_j P(j\vert s,a) = 1$.

We will not discuss how or when the immediate reward reach the agent. It may be accumulated in the time frame [t, t+1], or alternatively, it can be given in a single point in time between t and t+1. In any case all that matters to the agent is that the immediate reward reaches it before t+1.

A Markovian process is define as a process in which the only information needed from history is the current state. We define a Markovian process as (T, S, A, Pt(.|s,a), Rt(s,a)). The process defined here is a Markovian one since the following states and immediate rewards (and therefore the whole continuation of the process) depends only on the current state and operation chosen, and not on the history.


next up previous
Next: Policy and Decision Rules Up: Introduction Previous: states and actions
Yishay Mansour
1999-11-15