Optimization Online


Policy-based branch-and-bound for infinite-horizon Multi-model Markov decision processes

Vinayak Ahluwalia (vahluw***at***umich.edu)
Lauren Steimle (steimle***at***gatech.edu)
Brian Denton (btdenton***at***umich.edu)

Abstract: MDPs are models for sequential decision-making that inform decision making in many fields, including healthcare, manufacturing, and others. However, the optimal policy for an MDP may be sensitive to the reward and transition parameters which are often uncertain because parameters are typically estimated from data or rely on expert opinion. To address parameter uncertainty in MDPs, it has been proposed that multiple models of the parameters be incorporated into the solution process, but solving these problems can be computationally challenging. In this article, we propose a policy-based branch-and-bound approach that leverages the structure of these problems and numerically compare several important algorithmic designs. We demonstrate that our approach outperforms existing methods on test cases from the literature including randomly generated MDPs, a machine maintenance MDP, and an MDP for medical decision making.

Keywords: Markov decision processes; parameter uncertainty; branch-and-bound

Category 1: Other Topics (Dynamic Programming )

Category 2: Stochastic Programming

Citation: January 2020

Download: [PDF]

Entry Submitted: 01/07/2020
Entry Accepted: 01/07/2020
Entry Last Modified: 01/07/2020

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society