-

 

 

 




Optimization Online





 

2-class Internal Cross-validation Pruned Eigen Transformation Classification Trees

Steven De Bruyne(Steven.De.Bruyne***at***vub.ac.be)
Frank Plastria(Frank.Plastria***at***vub.ac.be)

Abstract: It has been demonstrated that decision trees built in a feature space yielded by some eigen transformation can be competitive with industry standards. Unfortunately, the selection of such a transformation and the dimension of the feature space that should be retained is not self-evident. These trees however have interesting properties that can be exploited. Since the order of the splits is fixed due to the known importance of each feature given by their corresponding eigenvalues, all trees are pruned versions of the largest tree. This property makes it possible to prune such a tree based on an internal cross-validation using the training data. This allows us to use a technique that should overfit less than for example the estimated error rates used in C4.5 classification trees for pruning, while still using the entire training data to build the tree. We therefore present an algorithm that divides the training data into folds similar to a cross-validation. The split values are calculated for each of the obtained internal training folds. The nodes of the tree are then evaluated with the corresponding internal test folds and nodes that overfit are pruned. This is done using each of the eigen transformations. The best tree is selected and the final split values are then calculated for the selected pruned tree based on the entire training data. Results show that we can expect trees that are optimal or near optimal if there is enough training data relative to the size of the tree.

Keywords: Classification, Classification Tree, Dimension Reduction, Eigen Transformation, Internal Cross-validation, Eigenvalue-based Classification Tree, EVCT, 2-class Internal Cross-validation Pruned Eigen Transformation Classification Trees, 2CIXVPETCT

Category 1: Applications -- Science and Engineering (Data-Mining )

Citation:

Download: [PDF]

Entry Submitted: 05/07/2008
Entry Accepted: 05/07/2008
Entry Last Modified: 05/07/2008

Modify/Update this entry


  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository

 

Submit
Update
Policies
Coordinator's Board
Classification Scheme
Credits
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Programming Society