Optimization Online


Temporal difference learning with kernels for pricing american-style options

Kengy Barty (kengy.barty***at***cermics.enpc.fr)
Jean-Sebastien Roy (jean-sebastien.roy***at***edf.fr)
Cyrille Strugarek (cyrille.strugarek***at***edf.fr)

Abstract: We propose in this paper to study the problem of estimating the cost-to-go function for an infinite-horizon discounted Markov chain with possibly continuous state space. For implementation purposes, the state space is typically discretized. As soon as the dimension of the state space becomes large, the computation is no more practicable, a phenomenon referred to as the curse of dimensionality. The approximation of dynamic programming problems is therefore of major importance. A powerful method for dynamic programming, often referred to as neuro-dynamic programming, consists in representing the Bellman function as a linear combination of a priori defined functions, called neurons. In this article, we propose an alternative approach very similar to temporal differences, based on functional gradient descent and using an infinite kernel basis.Furthermore, our algorithm, though aimed at infinite dimensional problems, is implementable in practice. We prove the convergence of this algorithm, and show applications on e.g. bermudan option pricing.

Keywords: TD Learning, Robbins-Monro Algorithm, Kernels

Category 1: Other Topics (Dynamic Programming )

Category 2: Stochastic Programming

Citation: EDF R&D Internal Report May 2005

Download: [PDF]

Entry Submitted: 05/19/2005
Entry Accepted: 05/19/2005
Entry Last Modified: 06/09/2005

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Programming Society