Optimization Online


Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

Mengdi Wang (mengdiw***at***princeton.edu)

Abstract: We propose a randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality, the algorithm adaptively samples state transitions and makes exponentiated primal-dual updates. We show that it finds an ε-optimal policy using nearly-linear running time in the worst case. For Markov decision processes that are ergodic under every stationary policy, we show that the algorithm finds an ε-optimal policy using running time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide new complexity benchmarks for solving stochastic dynamic programs.

Keywords: Markov decision process, randomized algorithm, linear programming, duality, primal- dual method, running-time complexity, stochastic approximation

Category 1: Other Topics (Dynamic Programming )

Category 2: Linear, Cone and Semidefinite Programming (Linear Programming )


Download: [PDF]

Entry Submitted: 04/05/2017
Entry Accepted: 04/06/2017
Entry Last Modified: 09/13/2017

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society