Optimization Online


Sample Size Selection in Optimization Methods for Machine Learning

Richard Byrd(Richard.Byrd***at***colorado.edu)
Gillian M Chin(gillianmchin***at***gmail.com)
Jorge Nocedal(nocedal***at***eecs.northwestern.edu)
Yuchen Wu(rainson.wood***at***gmail.com)

Abstract: This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an O(1/e) complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L1 regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifi es zero variables, and a subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

Keywords: optimization, machine learning, L1 regularization

Category 1: Nonlinear Optimization

Category 2: Stochastic Programming

Citation: unpublished

Download: [PDF]

Entry Submitted: 11/02/2011
Entry Accepted: 11/02/2011
Entry Last Modified: 11/02/2011

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society