Sunday, February 20, 14:15-15:15
at 14:00
Schreiber Building, Room 309
We show that, given a generative model, one can compute efficiently
a near-optimal policy (in time which is independent from the size
of the underlying MDP). Our technique can be viewed as a form
of sparse
sampling in an MDP.
Similarly, we address the question of finding a near-optimal
policy from a limited class of policies. We show that for a
finite class of policies, it is sufficient that the number of
queries to the generative model grows only logarithmically in
the number of policies. We also show that for an infinite
class of policies, one can use the VC-dimension to bound the
number of queries to the generative model.
The talk will not assume any prior knowledge about Markov Decision Processes.
This is a joint work with Michael Kearns (AT&T) and Andrew Ng (U.
C. Berkeley)
For colloquium schedule, see http://www.math.tau.ac.il/~zwick/colloq.html