Optimization Online


Approximating K-means-type clustering via semidefinite programming

Jiming Peng (pengj***at***mcmaster.ca)
Yu Wei (weiy3***at***mcmaster.ca)

Abstract: One of the fundamental clustering problems is to assign $n$ points into $k$ clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming. Secondly, we consider the issue of how to extract a feasible solution of the original MSSC model from the approximate solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. We show that for bi-clustering, our algorithm can provide a 2-approximate solution to the original problem in $O(n\log n)$ time. Promising numerical results based on our new method are reported.

Keywords: K-means clustering, Principal component analysis, Semidefinite programming, Approximation

Category 1: Applications -- Science and Engineering (Data-Mining )

Category 2: Combinatorial Optimization (Approximation Algorithms )

Category 3: Convex and Nonsmooth Optimization (Convex Optimization )


Download: [Postscript][PDF]

Entry Submitted: 04/22/2005
Entry Accepted: 04/22/2005
Entry Last Modified: 02/22/2006

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Programming Society