Optimization Online


Best Subset Selection via Cross-validation Criterion

Yuichi Takano (ytakano***at***sk.tsukuba.ac.jp)
Ryuhei Miyashiro (r-miya***at***cc.tuat.ac.jp)

Abstract: This paper is concerned with the cross-validation criterion for best subset selection in a linear regression model. In contrast with the use of statistical criteria (e.g., Mallows' $C_p$, AIC, BIC, and various information criteria), the cross-validation only requires the mild assumptions, namely, samples are identically distributed, and training and validation samples are independent. For this reason, the cross-validation criterion is expected to work well in most situations for any predictive methods. The purpose of this paper is to establish a mixed-integer optimization (MIO) approach to selecting the best subset of explanatory variables via the cross-validation criterion. This subset selection problem can be formulated as a bilevel MIO problem. We then reduce it to a mixed-integer quadratic optimization problem, which can be solved exactly using optimization software. The efficacy of our method is evaluated through simulation experiments by comparison with statistical-criterion-based exhaustive search algorithms and the $L_1$-regularized regression. Simulation results demonstrate that our method delivered good performance in both the subset selection accuracy and the predictive performance when the signal-to-noise ratio was low.

Keywords: Integer programming, Subset selection, Cross-validation, Ridge regression, Statistics

Category 1: Applications -- Science and Engineering (Statistics )

Category 2: Integer Programming


Download: [PDF]

Entry Submitted: 01/13/2019
Entry Accepted: 01/13/2019
Entry Last Modified: 01/13/2019

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society