# Dynamic Programming

Isaac Meilijson
Department of Statistics and Operations Research
isaco@math.tau.ac.il
Office Hours: Tuesdays 15-16. 311 Schreiber.

+972-3-640-8826.

### Tel Aviv University course number: 0365-4432

### Class Hours,
Spring Semester 2006:

Tuesdays 17:10-20:00 at 102 Ornstein

### Due to the multiplicity of Tuesday holidays, two additional
classes will
be held, each at the regular schedule 17-20: on Sunday April 9, 2006
(first day of the Passover vacation) and on Thursday April 20,
2006 (Mimouna Day).

### Course Content:

## Topics:

- Definition of a stochastic DP problem, policy, discounting. Examples.
- Contraction operators and the Banach fixed point theorem.
- Discounted Dynamic Programming. The fundamental theorem.
Stationary policies. Howard and Eaton-Zadeh improvement routines.
The discounted Chow & Robbins search problem.
- Negative Dynamic Programming. The main theorem.
Markovian policies. Howard improvement. Solution of the Needle in
a Haystack problem. Priority rules in Queueing. The role of future
irrelevance. The two-armed bandit problem.
- Positive Dynamic Programming. Optimality of
bold play in subfair casinos with fixed goal. The role of excessivity.
Martingale proofs via excessivity.
- One-optimal solutions and their relation to the
average cost criterion. The Blackwell algorithm.
- Optimal stopping. Backwards induction,
passage to the limit. The Secretary problem. Wald's Sequential Probability
Ratio Test. The Chow & Robbins search problem with cost per observation.
- The Gittins index. Time sharing of a single
server queue by a number of types of customers.

### Helpful references; TENTATIVE - IN PREPARATION.

- Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist.
33, 719-726.
- Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist.
36, 226-235.
- Chow, Y.S., Robbins, H. and Siegmund, D. (1971). Great expectations:
the theory of optimal stopping. Houghton Mifflin, Boston.
- Dubins, L.E. and Savage, L.J. (1965). How to gamble if you must. McGraw
Hill, New York. Re-published as Inequalities for stochastic processes.
Dover, New York (1976).
- Feldman, D. (1962). Contributions to the two-armed bandit problem.
Ann. Math. Statist. 33, 847-856.
- Gittins, J.C. (1979). Bandit processes and dynamic allocation indices.
J. Roy. Statist. Soc. (B) 41, 148-177.
- Meilijson, I. and Weiss, G. (1977). Multiple feedback at a single
server station. Stoch. Proc. Appl. 5, 195-205.
- Meilijson, I. and Weiss, G. (1984). A simple optimal solution to a
time sharing problem. Proceedings of the 4th International Conference
in computer Sciences, Santiago de Chile.
- Rodman, L. (1978). On the many-armed bandit problem. Ann. Prob. 6,
491-498.
- Strauch, R.E. (1966). Negative dynamic programming. Ann. Math.
Statist. 37, 871-890.

##### Last update: March 08, 2006