next up previous
Next: Existence of a unique Up: No Title Previous: Assumptions

   
Calculating the Return Value of a Given Policy

According to Theorem 4.3 from the previous lecture, for each stochastic history dependent policy $ \pi=(d_1, d_2, ...) \in
\Pi^{HR}$ there exists a Markovian stochastic policy $\pi^{'}=(d_1^{'}, d_2^{'}, ...) \in \Pi^{MR}$ that has the same return, i.e., $ v_\lambda^\pi=v_\lambda^{\pi^{'}}$.

Let $\pi\in\pi^{MR}$, then

\begin{eqnarray*}v_\lambda^\pi(s) & = &
E_s^{\pi}[\sum_{t=1}^\infty\lambda^{t-1...
...Y_t)] =
\sum_{t=1}^{\infty}\lambda^{t-1}P_{\pi}^{t-1}r_{d_{t}}
\end{eqnarray*}



\begin{eqnarray*}\vec{v}_\lambda^\pi & = & \vec{r}_{d_{1}} + \lambda
P_{d_{1}}[...
...lambda
P_{d_{2}}\vec{r}_{d_{3}}+\ldots}_{v_\lambda^{\pi^{'}}}]
\end{eqnarray*}


(where $\pi^{'}$ is similar to policy $\pi$ starting from the second step)

\begin{eqnarray*}\vec{v}_{\lambda}^{\pi} = \vec{r}_{d_{1}}+\lambda
P_{d_{1}}\vec{v}_{\lambda}^{\pi^{'}}
\end{eqnarray*}


If $\pi$ is stationary then $\pi^{'}=\pi$ and

\begin{eqnarray*}\vec{v}_{\lambda}^{\pi} = \vec{r}_{d_{1}}+\lambda
P_{d_{1}}\vec{v}_{\lambda}^{\pi}
\end{eqnarray*}


All the parameters aside from $\vec{v}_\lambda^\pi$ are known, thus we have a set of linear equations of the form $\vec{x}=r_{d{1}}+\lambda P_{d_{1}}\vec{x}$. We will show that these equations have a single solution which is $\vec{v}_\lambda^\pi$.


 

Yishay Mansour
1999-11-24