next up previous
Next: SARSA Up: Q-learning Previous: Q-learning

   
remarks:

For on-policy $\pi$ is $\epsilon-greedy$ regarding Q at the moment. Thus we get Sarsa algorithm


Yishay Mansour
2000-01-07