Optimization Online


Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach

Amin Dehghanian(amin.dehghanian***at***gmail.com)

Abstract: Mathematical programming formulation to identify the optimal value function of a negative Markov decision process (MDP) is non-convex, non-smooth, and computationally intractable. Also note that other well-known solution methods of MDP do not work properly for a negative MDP. More specifically, the policy iteration diverges, and the value iteration converges but does not provide an error bound in a finite number of iterations. To overcome this challenge, we develop a mixed integer linear programming (MILP) formulation using the theory of disjunctive programming. Then, we discuss its strength and how to extract the optimal value function and an optimal policy from this MILP formulation. Moreover, we exemplify our formulation for a class of stopping problems studied by Ross (1983). At the end, we briefly present another formulation developed by the traditional big-M method.

Keywords: Negative Markov Decision Process, Disjunctive Programming, Integer Programming, Optimal Stopping

Category 1: Other Topics (Dynamic Programming )

Category 2: Integer Programming ((Mixed) Integer Linear Programming )

Citation: Dehghanian, A., "Identifying the Optimal Value Function of a Negative Markov Decision Process: An Integer Programming Approach", working paper, 2019.

Download: [PDF]

Entry Submitted: 02/15/2019
Entry Accepted: 02/15/2019
Entry Last Modified: 02/15/2019

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society