## Use of Linear Programming to Find the Optimal Policy

The general translation of the problem to a linear program is as follows:
Minimize: x
Subject To:

It was shown that the Optimality Equations of the discounted infinite horizon problem are:

Using the above translation we would get the following linear program:
Minimize:
Subject To:
is any set of constants with the following characteristics:
These constants can be viewed as the probability distribution of the agent's initial state in the MDP.
Accordingly, the dual linear program would be:
Maximize:
Subject To:
Intuitively, can be thought of as the probability of being in state s and performing action a, while taking into account the discount factor - . In the following theorem is defined more formally.

Theorem 6.8 (no proof)
1.
For every , and , let:

is a feasible solution of the dual linear maximization problem.
2.
Let be a feasible solution of the dual linear problem. For every and define dx(s)such that:

and then

Theorem is assuring that every solution to the dual linear program can be translated to a policy with a return value equal to the maximized element in the program. Thus, the best solution of the program can be translated to a policy with the maximal possible return value. This policy will, obviously, be the best policy.