Optimization Online


Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

Yichen Chen(yichenc***at***princeton.edu)
Mengdi Wang(mengdiw***at***princeton.edu)

Abstract: We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute-$\epsilon$-optimal policy, with high probability, using $\mathcal{O}\left(\frac{|\mathcal{S}|^4 |\mathcal{A}|^2\sigma^2 }{(1-\gamma)^6\epsilon^2} \right)$ iterations/samples for the infinite-horizon discounted-reward MDP and $\mathcal{O}\left(\frac{|\mathcal{S}|^4 |\mathcal{A}|^2H^6\sigma^2 }{\epsilon^2} \right)$ for the finite-horizon MDP. %This provides a scalable method with theoretical guarantees that nearly matches the theoretical lower bound.

Keywords: Reinforcement Learning, Stochastic Primal-Dual Methods

Category 1: Other Topics (Dynamic Programming )


Download: [PDF]

Entry Submitted: 12/07/2016
Entry Accepted: 12/08/2016
Entry Last Modified: 12/07/2016

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society