Reinforcement Learning via Parametric Cost Function Approximation for Multistage Stochastic Programming

Saeed Ghadimi (sghadimi***at***princeton.edu)
Raymond Perkins (raymondp***at***princeton.edu)
Warren Powell (powell***at***princeton.edu)

Abstract: The most common approaches for solving stochastic resource allocation problems in the research literature is to either use value functions (``dynamic programming") or scenario trees (``stochastic programming") to approximate the impact of a decision now on the future. By contrast, common industry practice is to use a deterministic approximation of the future which is easier to understand and solve, but which is criticized for ignoring uncertainty. We show that a parameterized version of a deterministic lookahead can be an effective way of handling uncertainty, while enjoying the computational simplicity of a deterministic lookahead. We present the parameterized lookahead model as a form of policy for solving a stochastic base model, which is used as the basis for optimizing the parameterized policy. This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees. We formalize this approach and demonstrate its use in the context of a complex, nonstationary energy storage problem.

Keywords: Stochastic Optimization, Policy Search, Stochastic Programming, Simulation-based Optimization, Parametric Cost Function Approximation

Category 1: Stochastic Programming

Category 2: Other Topics (Optimization of Simulated Systems )

Category 3: Applications -- OR and Management Sciences


Download: [PDF]

Entry Submitted: 12/27/2018
Entry Accepted: 12/28/2018
Entry Last Modified: 10/06/2019

