  


Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach
Amin Dehghanian(amin.dehghaniangmail.com) Abstract: Mathematical programming formulation to identify the optimal value function of a negative Markov decision process (MDP) is nonconvex, nonsmooth, and computationally intractable. Also note that other wellknown solution methods of MDP do not work properly for a negative MDP. More specifically, the policy iteration diverges, and the value iteration converges but does not provide an error bound in a finite number of iterations. To overcome this challenge, we develop a mixed integer linear programming (MILP) formulation using the theory of disjunctive programming. Then, we discuss its strength and how to extract the optimal value function and an optimal policy from this MILP formulation. Moreover, we exemplify our formulation for a class of stopping problems studied by Ross (1983). At the end, we briefly present another formulation developed by the traditional bigM method. Keywords: Negative Markov Decision Process, Disjunctive Programming, Integer Programming, Optimal Stopping Category 1: Other Topics (Dynamic Programming ) Category 2: Integer Programming ((Mixed) Integer Linear Programming ) Citation: Dehghanian, A., "Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach", working paper, 2019. Download: [PDF] Entry Submitted: 02/15/2019 Modify/Update this entry  
Visitors  Authors  More about us  Links  
Subscribe, Unsubscribe Digest Archive Search, Browse the Repository

Submit Update Policies 
Coordinator's Board Classification Scheme Credits Give us feedback 
Optimization Journals, Sites, Societies  