Optimization Online


Batch Learning in Stochastic Dual Dynamic Programming

Daniel ┴vila(daniel.avila***at***uclouvain.be)
Anthony Papavasiliou(anthony.papavasiliou***at***uclouvain.be)
Nils L÷hndorf(nils.loehndorf***at***uni.lu)

Abstract: We consider the stochastic dual dynamic programming (SDDP) algorithm, which is a widely employed algorithm applied to multistage stochastic programming, and propose a variant using batch learning, a technique used with success in the reinforcement learning framework. We cast SDDP as a type of Q-learning algorithm and describe its application in both risk neutral and risk averse settings. We demonstrate the efficiency of the algorithm on a lost sales inventory control problem with lead times, as well as a real-world instance of the long-term planning problem of interconnected hydropower plants in Colombia. We find that the proposed technique is able to produce tighter optimality gaps in a shorter amount of time than conventional SDDP, including the PSR SDDP commercial software. We also find that parallel computation of SDDP backward passes benefit from batch learning.

Keywords: Stochastic programming, Dynamic programming, Reinforcement learning, SDDP, Parallel Computing

Category 1: Stochastic Programming

Category 2: Other Topics (Dynamic Programming )


Download: [PDF]

Entry Submitted: 05/17/2021
Entry Accepted: 05/17/2021
Entry Last Modified: 05/17/2021

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society