Optimization Online


Robust Markov Decision Processes

Wolfram Wiesemann (wwiesema***at***imperial.ac.uk)
Daniel Kuhn (dkuhn***at***imperial.ac.uk)
Berc Rustem (br***at***imperial.ac.uk)

Abstract: Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use due to their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a pre-specified probability $1 - \beta$. Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least $1 - \beta$. Our method involves the solution of tractable conic programs of moderate size.

Keywords: Robust Optimisation; Markov Decision Processes; Semidefinite Programming

Category 1: Robust Optimization

Category 2: Other Topics (Dynamic Programming )

Citation: Working paper, Department of Computing, Imperial College London, May 2010

Download: [PDF]

Entry Submitted: 05/05/2010
Entry Accepted: 05/05/2010
Entry Last Modified: 02/09/2012

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society