Optimization Online


Optimal Decision Trees for Categorical Data via Integer Programming

Oktay Gunluk(gunluk***at***us.ibm.com)
Jayant Kalagnanam(jayant***at***us.ibm.com)
Matt Menickelli(mmenickelly***at***anl.gov )
Katya Scheinberg (katyas***at***lehigh.edu)

Abstract: Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.

Keywords: Decision Trees, Integer Programming, Machine Learning

Category 1: Integer Programming


Download: [PDF]

Entry Submitted: 01/02/2018
Entry Accepted: 01/03/2018
Entry Last Modified: 01/02/2018

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society