Optimization Online


Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Napat Rujeerapaiboon (napat.rujeerapaiboon***at***epfl.ch)
Kilian Schindler (kilian.schindler***at***epfl.ch)
Daniel Kuhn (daniel.kuhn***at***epfl.ch)
Wolfram Wiesemann (ww***at***imperial.ac.uk)

Abstract: Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier-detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to high quality solutions for the MILP. We prove that these solutions are optimal in the MILP if a cluster separation condition holds. To our best knowledge, we propose the first tractable solution scheme for the joint outlier-detection and clustering problem with optimality guarantees.

Keywords: Semide nite programming, K-means clustering, outlier detection, optimality guarantee

Category 1: Linear, Cone and Semidefinite Programming

Category 2: Combinatorial Optimization (Approximation Algorithms )

Category 3: Applications -- Science and Engineering (Data-Mining )


Download: [PDF]

Entry Submitted: 05/22/2017
Entry Accepted: 05/22/2017
Entry Last Modified: 10/05/2017

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society