next up previous
Next: Bounding the Number of Up: Proof for The Indirect Previous: Proof for The Indirect

   
Notations

Claim 8.1     For any $w>0,\ (s,a)\in S\times A$

\begin{displaymath}\Pr( \left\Vert\sum_{j\in S}{\widehat{P}^a}_{sj}v_k(j) -
\s...
...}v_k(j)\right\Vert \geq w )
\leq 2e^{-2\cdot m_I\cdot w^2}
\end{displaymath} (18)

Proof:The samples are independent and identically distributed; they are bounded since the immediate return is bounded and $\gamma<1$; and:

\begin{eqnarray*}E[\sum_{j\in S}{\widehat{P}^a}_{sj} \cdot v_k(j)]
&=& \sum_{j...
...cdot
v_k(j)\\
&=& \sum_{j\in S} P^{a}_{sj}\cdot v_k(j)\\
\end{eqnarray*}


therefore, by Chernoff-inequality, the claim holds:

\begin{eqnarray*}\lefteqn{ \Pr( \left\Vert\sum_{j\in S}{\widehat{P}^a}_{sj} \cdo...
...v_k(j)]\right\Vert \geq w
)
\leq 2e^{-2\cdot m_I\cdot w^2}
\end{eqnarray*}


$\Box$

Claim 8.2   Bounding $\left\Vert \widehat{v}_l - v_l \right\Vert$: 
1.
$\left\Vert\widehat{v}_l - v_l\right\Vert \leq \left\Vert \widehat{Q}_l - Q_l
\right\Vert$
2.
$\sum_{j \in S} {\widehat{P}^a}_{sj} = 1$
3.
$\left\Vert\widehat{v}_l - v_l\right\Vert
\leq \max_{s\in S,\ a\in A,\ 0\leq k...
...
- \sum_{j\in S}P^{a}_{sj}v_k(j) \right\Vert } \cdot
\sum_{i=1}^l{\gamma^i}$

Proof:

$\Box$

Claim 8.3     As a conclusion from claims [*] and [*] we get that for every w>0:

\begin{displaymath}\Pr( \left\Vert\widehat{v}_l - v_l\right\Vert \geq w ) \leq
...
...\cdot 2e^{-2m_I({\frac{w(1-\gamma)}{\gamma(1-\gamma^l)}})^2}
\end{displaymath} (19)

Proof:


$\Box$

Claim 8.4     As we've seen in class:

\begin{displaymath}\left\Vertv_l - v^*\right\Vert \leq \frac{\gamma^l}{1-\gamma}
\end{displaymath} (20)

Claim 8.5   For every $l > \frac{\ln{\varepsilon(1-\gamma)}}{\ln\gamma}$,

\begin{displaymath}\Pr( \left\Vert\widehat{v}_l-v^*\right\Vert \geq \varepsilon ...
...silon
\frac{\gamma^l}{1-\gamma})}{\gamma(1-\gamma^l)})}^2}
\end{displaymath} (21)

Proof:

\begin{eqnarray*}\Pr( \left\Vert\widehat{v}_l-v^*\right\Vert \geq \varepsilon)
...
...on -
\frac{\gamma^l}{1-\gamma})}{\gamma(1-\gamma^l)})}^2}\\
\end{eqnarray*}


Let's see when does $\varepsilon - \frac{\gamma^l}{1-\gamma} >
0$:

\begin{eqnarray*}\varepsilon - \frac{\gamma^l}{1-\gamma} > 0
&\Leftrightarrow&...
...&&(Since\ \ln\gamma < 1\ due\ to\ the\ fact\ that\ 0<\gamma<1)
\end{eqnarray*}


And this is the claim's assumption, so the proof is complete. $\Box$


next up previous
Next: Bounding the Number of Up: Proof for The Indirect Previous: Proof for The Indirect
Yishay Mansour
2000-05-30