  


Stochastic PrimalDual Methods and Sample Complexity of Reinforcement Learning
Yichen Chen(yichencprinceton.edu) Abstract: We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic PrimalDual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute$\epsilon$optimal policy, with high probability, using $\mathcal{O}\left(\frac{\mathcal{S}^4 \mathcal{A}^2\sigma^2 }{(1\gamma)^6\epsilon^2} \right)$ iterations/samples for the infinitehorizon discountedreward MDP and $\mathcal{O}\left(\frac{\mathcal{S}^4 \mathcal{A}^2H^6\sigma^2 }{\epsilon^2} \right)$ for the finitehorizon MDP. %This provides a scalable method with theoretical guarantees that nearly matches the theoretical lower bound. Keywords: Reinforcement Learning, Stochastic PrimalDual Methods Category 1: Other Topics (Dynamic Programming ) Citation: Download: [PDF] Entry Submitted: 12/07/2016 Modify/Update this entry  
Visitors  Authors  More about us  Links  
Subscribe, Unsubscribe Digest Archive Search, Browse the Repository

Submit Update Policies 
Coordinator's Board Classification Scheme Credits Give us feedback 
Optimization Journals, Sites, Societies  