On the Use of Stochastic Hessian Information in Unconstrained Optimization

Richard Byrd(richard.byrd***at***colorado.edu)
Gillian Chin(gillianmchin***at***gmail.com)
Will Neveitt(wneveitt***at***google.com)
Jorge Nocedal(nocedal***at***eecs.northwestern.edu)

Abstract: This paper describes how to incorporate stochastic curvature information in a Newton- CG method and in a limited memory quasi-Newton method for large scale optimization. The motivation for this work stems from statistical learning and stochastic optimization applications in which the objective function is the sum of a very large number of loss terms, and can be evaluated with a varying degree of precision. Curvature information is incorporated into two proposed semi-stochastic algorithms via a matrix-free conjugate gradient iteration, which is applied to a system using a sampled (or stochastic) Hessian based on a small batch size. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.

Keywords: Machine Learning, Unconstrained Optimization, Newton's method, limited memory BFGS method

Category 1: Nonlinear Optimization

Category 2: Stochastic Programming

Citation: unpublished: Technical Report, Northwestern University, Optimization Center, 2010/05

Entry Submitted: 06/16/2010
Entry Accepted: 06/16/2010
Entry Last Modified: 06/16/2010

