The Return Function

Next: Example 1 Up: Infinite Horizon Problems Previous: Infinite Horizon Problems

The Return Function

We will suggest three possible return functions for the infinite horizon problem:

1.

The expected sum of the immediate rewards, i.e.

$\begin{eqnarray*}V^{\pi}(s) & = & \lim_{N \rightarrow \infty} E_s^{\pi}[\sum_{t=... ...(X_t, Y_t)]\\ & = & \lim_{N \rightarrow \infty} V_N^{\pi}(s)\\ \end{eqnarray*}$

Note that this return function may diverge.

2.

The expected discounted sum of the immediate rewards, i.e.

$\begin{displaymath}V_\lambda^{\pi}(s) = \lim_{N \rightarrow \infty} E_s^{\pi}[\sum_{t=1}^N \lambda^{t-1}r_t(X_t, Y_t)],\;\;\; 0 < \lambda < 1\end{displaymath}$

In this case, a suffice condition for convergence can be for example: $\vert r(\cdot , \cdot)\vert \leq M$
Under this condition we can find an upper bound to the return function:

$\begin{displaymath}V_\lambda^{\pi}(s) \leq \sum_{t=1}^N \lambda^{t-1}M = \frac{M}{1-\lambda}\end{displaymath}$

Note that this bound is very sensative to the value of the paramter $\lambda$ .

$\begin{displaymath}\lim_{\lambda \rightarrow 1}V_{\lambda}^{\pi}(s) = V^{\pi}(s)\end{displaymath}$

3.

The expected average reward

$\begin{eqnarray*}g^{\pi}(s) & = & \lim_{N \rightarrow \infty} \frac{1}{N} E^{\pi... ...)]\\ & = & \lim_{N \rightarrow \infty} \frac{1}{N} V^{\pi}_N(s) \end{eqnarray*}$

This limit does not always exist. A sutisfactory demand for the limit's existance may be

(a): S is finite
(b): $\pi$ is Markovian and stationary
(c): the system is non periodic

These conditions will be discussed further in a later lecture.

Yishay Mansour
1999-11-18