-

 

 

 




Optimization Online





 

Dimensionality Reduction for Classification - Comparison of Techniques and Dimension Choice

Frank Plastria (Frank.Plastria***at***vub.ac.be)
Steven De Bruyne (Steven.De.Bruyne***at***vub.ac.be)
Emilio Carrizosa (ecarrizosa***at***us.es)

Abstract: Dimensionality reduction is an important issue nowadays in classification problems, since high-dimensional data sets are analyzed via sophisticated classification algorithms whose running times may be dramatically affected by the dimensionality of the data. We investigate the effects of dimensionality reduction using different techniques and different dimensions on two-class data sets as pre-processing for two classification algorithms. Besides reducing the dimensionality with the use of principal components and linear discriminants, we also introduce four new techniques. After this dimensionality reduction two algorithms are applied. The first algorithm takes advantage of the reduced dimensionality itself while the second one directly exploits the dimensional ranking. We show on six two-class data sets with numerical attributes that by effectively executing this pre-processing, we can make these algorithms generate classifiers that can rival industry standards. The choice of the dimensionality has a significant impact. On the one hand, results show that it is worthwhile not to choose a fixed dimensionality without considering the data. On the other hand, more importantly we also observe that common approaches based on the residual variance that dissociate the data and the classification algorithm to determine the dimensionality may be bad estimators if the goal is to maximize the classification power.

Keywords: Dimensionality Reduction, Dimension Choice, Classification, Principal Components, PCA, Linear Discriminants, Fisher, LDA, Principal Separation Components, PSC, Mean Components, PMC, LMD, PMSC, Optimal Distance Separating Hyperplane, ODSH, Eigenvalue-based Classification Tree, EVCT

Category 1: Applications -- Science and Engineering (Data-Mining )

Citation: This paper is going to be published in Lecture Notes in Artificial Intelligence by Springer

Download: [PDF]

Entry Submitted: 04/23/2008
Entry Accepted: 04/23/2008
Entry Last Modified: 05/08/2008

Modify/Update this entry


  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository

 

Submit
Update
Policies
Coordinator's Board
Classification Scheme
Credits
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Programming Society