A Framework for Kernel Regularization with Applications to Protein Clustering
Fan Lu (flustat.wisc.edu)
Abstract: We develop and apply a novel framework which is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of coordinates in Euclidean space which attempt to respect the information available, while controlling for complexity of the kernel. The resulting set of coordinates are highly appropriate for visualization and as input to classification and clustering algorithms. The framework is formulated in terms of a class of optimization problems which can be solved efficiently using modern convex cone programming software. The power of the method is illustrated in the context of protein clustering based on primary sequence data. An application to the globin family of proteins resulted in a readily visualizable 3D sequence space of globins, where several sub-families and sub-groupings consistent with the literature were easily identifiable.
Keywords: Regularized Kernel Estimation, positive definite matrices, noisy dissimilarity data, convex cone programming, protein clustering, globin family, support vector machines, classification
Category 1: Applications -- Science and Engineering (Statistics )
Category 2: Applications -- Science and Engineering (Biomedical Applications )
Category 3: Linear, Cone and Semidefinite Programming
Citation: Technical Report No. 1107, Department of Statistics, University of Wisconsin, Madison, May, 2005.
Entry Submitted: 05/06/2005
Modify/Update this entry
|Visitors||Authors||More about us||Links|
Search, Browse the Repository
Give us feedback
|Optimization Journals, Sites, Societies|