Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach
Abstract: Mathematical programming formulation to identify the optimal value function of a negative Markov decision process (MDP) is non-convex, non-smooth, and computationally intractable. Also note that other well-known solution methods of MDP do not work properly for a negative MDP. More specifically, the policy iteration diverges, and the value iteration converges but does not provide an error bound in a finite number of iterations. To overcome this challenge, we develop a mixed integer linear programming (MILP) formulation using the theory of disjunctive programming. Then, we discuss its strength and how to extract the optimal value function and an optimal policy from this MILP formulation. Moreover, we exemplify our formulation for a class of stopping problems studied by Ross (1983). At the end, we briefly present another formulation developed by the traditional big-M method.
Keywords: Negative Markov Decision Process, Disjunctive Programming, Integer Programming, Optimal Stopping
Category 1: Other Topics (Dynamic Programming )
Category 2: Integer Programming ((Mixed) Integer Linear Programming )
Citation: Dehghanian, A., "Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach", working paper, 2019.
Entry Submitted: 02/15/2019
Modify/Update this entry
|Visitors||Authors||More about us||Links|
Search, Browse the Repository
Give us feedback
|Optimization Journals, Sites, Societies|