Dynamic Programming

Isaac Meilijson

Office Hours: Tuesdays 15-16. 311 Schreiber.

+972-3-640-8826.

Tuesdays 17:10-20:00 at 102 Ornstein

Topics:

Definition of a stochastic DP problem, policy, discounting. Examples.
Contraction operators and the Banach fixed point theorem.
Discounted Dynamic Programming. The fundamental theorem. Stationary policies. Howard and Eaton-Zadeh improvement routines. The discounted Chow & Robbins search problem.
Negative Dynamic Programming. The main theorem. Markovian policies. Howard improvement. Solution of the Needle in a Haystack problem. Priority rules in Queueing. The role of future irrelevance. The two-armed bandit problem.
Positive Dynamic Programming. Optimality of bold play in subfair casinos with fixed goal. The role of excessivity. Martingale proofs via excessivity.
One-optimal solutions and their relation to the average cost criterion. The Blackwell algorithm.
Optimal stopping. Backwards induction, passage to the limit. The Secretary problem. Wald's Sequential Probability Ratio Test. The Chow & Robbins search problem with cost per observation.
The Gittins index. Time sharing of a single server queue by a number of types of customers.

Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719-726.
Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist. 36, 226-235.
Chow, Y.S., Robbins, H. and Siegmund, D. (1971). Great expectations: the theory of optimal stopping. Houghton Mifflin, Boston.
Dubins, L.E. and Savage, L.J. (1965). How to gamble if you must. McGraw Hill, New York. Re-published as Inequalities for stochastic processes. Dover, New York (1976).
Feldman, D. (1962). Contributions to the two-armed bandit problem. Ann. Math. Statist. 33, 847-856.
Gittins, J.C. (1979). Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. (B) 41, 148-177.
Meilijson, I. and Weiss, G. (1977). Multiple feedback at a single server station. Stoch. Proc. Appl. 5, 195-205.
Meilijson, I. and Weiss, G. (1984). A simple optimal solution to a time sharing problem. Proceedings of the 4th International Conference in computer Sciences, Santiago de Chile.
Rodman, L. (1978). On the many-armed bandit problem. Ann. Prob. 6, 491-498.
Strauch, R.E. (1966). Negative dynamic programming. Ann. Math. Statist. 37, 871-890.