  


Lower Bound On the Computational Complexity of Discounted Markov Decision Problems
Mengdi Wang(mengdiwprinceton.edu) Abstract: We study the computational complexity of the infinitehorizon discountedreward Markov Decision Problem (MDP) with a finite state space $\cS$ and a finite action space $\cA$. We show that any randomized algorithm needs a running time at least $\Omega(\carS^2\carA)$ to compute an $\epsilon$optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $\Omega\left( \frac{\carS \carA}{\epsilon} \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input. Keywords: Markov decision process, computational complexity Category 1: Other Topics (Dynamic Programming ) Category 2: Linear, Cone and Semidefinite Programming (Linear Programming ) Citation: Download: [PDF] Entry Submitted: 05/20/2017 Modify/Update this entry  
Visitors  Authors  More about us  Links  
Subscribe, Unsubscribe Digest Archive Search, Browse the Repository

Submit Update Policies 
Coordinator's Board Classification Scheme Credits Give us feedback 
Optimization Journals, Sites, Societies  