next up previous
Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal

   
The Solution

We will first construct a corresponsing MDP for the problem, and then develop the optimal policy for it.
Let $A=\{Continue, QuitAndHire\}$ the possible actions, where Continue stands for 'continue' and QuitAndHire for 'quit and recruit'. Let $S=\{MaxSoFar, Other, 1, 0\}$ be the group of states of the MDP, standing for:
  
Figure: The recruiting problem MDP

Figure [*] shows the resultant MDP, using the following transition probabilities: where N is the finite time horizon (the size of the candidates group) and t is the current time (the number of candidates interviewed so far).
Writing the optimality equations for this problem we get:

\begin{displaymath}U_t^*(1)=1, \;\; U_t^*(0)=0, \;\; U_N^*(MaxSoFar)=U_N^*(Other)=0\end{displaymath}


\begin{eqnarray*}U_t^*(Other) & = & \max\{0, r_tU_{t+1}^*(MaxSoFar)+(1-r_t)U_{t+...
...-r_t)U_{t+1}^*(Other)\} \\
& = & \max\{q_t, U_t^*(Other)\} \\
\end{eqnarray*}


Assigning the terms for qt and rt we get:

 \begin{displaymath}U_t^*(Other)=\frac{1}{t+1}U_{t+1}^*(MaxSoFar)+\frac{t}{t+1}U_{t+1}^*(Other)
\end{displaymath} (4.1)


 \begin{displaymath}U_t^*(MaxSoFar)=\max\{\frac{t}{N}, U_t^*(Other)\}
\end{displaymath} (4.2)

An optimal decision rule has the following properties: We now show that the optimal policy is of the form:


next up previous
Next: Optimality Proof Up: Example: The Recruiting Problem Previous: The Goal
Yishay Mansour
1999-11-18