next up previous
Next: Example 1 Up: Infinite Horizon Problems Previous: Infinite Horizon Problems

   
The Return Function

We will suggest three possible return functions for the infinite horizon problem:
1.
The expected sum of the immediate rewards, i.e.

\begin{eqnarray*}V^{\pi}(s) & = & \lim_{N \rightarrow \infty} E_s^{\pi}[\sum_{t=...
...(X_t, Y_t)]\\
& = & \lim_{N \rightarrow \infty} V_N^{\pi}(s)\\
\end{eqnarray*}


Note that this return function may diverge.
2.
The expected discounted sum of the immediate rewards, i.e.

\begin{displaymath}V_\lambda^{\pi}(s) = \lim_{N \rightarrow \infty} E_s^{\pi}[\sum_{t=1}^N \lambda^{t-1}r_t(X_t, Y_t)],\;\;\; 0 < \lambda < 1\end{displaymath}

In this case, a suffice condition for convergence can be for example: $\vert r(\cdot , \cdot)\vert \leq M$
Under this condition we can find an upper bound to the return function:

\begin{displaymath}V_\lambda^{\pi}(s) \leq \sum_{t=1}^N \lambda^{t-1}M = \frac{M}{1-\lambda}\end{displaymath}

Note that this bound is very sensative to the value of the paramter $\lambda$.

\begin{displaymath}\lim_{\lambda \rightarrow 1}V_{\lambda}^{\pi}(s) = V^{\pi}(s)\end{displaymath}

3.
The expected average reward

\begin{eqnarray*}g^{\pi}(s) & = & \lim_{N \rightarrow \infty} \frac{1}{N} E^{\pi...
...)]\\
& = & \lim_{N \rightarrow \infty} \frac{1}{N} V^{\pi}_N(s)
\end{eqnarray*}


This limit does not always exist. A sutisfactory demand for the limit's existance may be
(a)
S is finite
(b)
$\pi$ is Markovian and stationary
(c)
the system is non periodic
These conditions will be discussed further in a later lecture.



Yishay Mansour
1999-11-18