Optimization Online


Reinforcement Learning via Parametric Cost Function Approximation for Multistage Stochastic Programming

Saeed Ghadimi (sghadimi***at***princeton.edu)
Raymond Perkins (raymondp***at***princeton.edu)
Warren Powell (powell***at***princeton.edu)

Abstract: The most common approaches for solving stochastic resource allocation problems in the research literature is to either use value functions (``dynamic programming") or scenario trees (``stochastic programming") to approximate the impact of a decision now on the future. By contrast, common industry practice is to use a deterministic approximation of the future which is easier to understand and solve, but which is criticized for ignoring uncertainty. We show that a parameterized version of a deterministic lookahead can be an effective way of handling uncertainty, while enjoying the computational simplicity of a deterministic lookahead. We present the parameterized lookahead model as a form of policy for solving a stochastic base model, which is used as the basis for optimizing the parameterized policy. This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees. We formalize this approach and demonstrate its use in the context of a complex, nonstationary energy storage problem.

Keywords: Stochastic Optimization, Policy Search, Stochastic Programming, Simulation-based Optimization, Parametric Cost Function Approximation

Category 1: Stochastic Programming

Category 2: Other Topics (Optimization of Simulated Systems )

Category 3: Applications -- OR and Management Sciences


Download: [PDF]

Entry Submitted: 12/27/2018
Entry Accepted: 12/28/2018
Entry Last Modified: 10/06/2019

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society