LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data
Abstract: The LASSO-Patternsearch algorithm is proposed as a two-step method to identify clusters or patterns of multiple risk factors for outcomes of interest in demographic and genomic studies. The predictor variables are dichotomous or can be coded as dichotomous. Many diseases are suspected of having multiple interacting risk factors acting in concert, and it is of much interest to uncover higher order interactions or risk patterns when they exist. The patterns considered here are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. Then the patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act as a model selector, is used at both steps. We first applied the method to myopia data from the population-based Beaver Dam Eye Study, exposing physiologically interesting interacting risk factors. In particular, we found that for an older cohort the risk of progression of myopia for smokers is reduced by taking vitamins while the risk for non-smokers is independent of the “taking vitamins” variable. This is in agreement with the general result that smoking reduces the absorption of vitamins, and certain vitamins have been associated with eye health. We then applied the method to data from a generative model of Rheumatoid Arthritis based on Problem 3 from the Genetic Analysis Workshop 15, successfully demonstrating its potential to efficiently recover higher order patterns from attribute vectors of length typical of genomic studies.
Keywords: pattern search, logistic regression, LASSO,
Category 1: Applications -- Science and Engineering (Statistics )
Category 2: Convex and Nonsmooth Optimization (Nonsmooth Optimization )
Citation: Technical Report No. 1141, Department of Statistics, University of Wisconsin-Madison, January, 2008.
Entry Submitted: 01/05/2008
Modify/Update this entry
|Visitors||Authors||More about us||Links|
Search, Browse the Repository
Give us feedback
|Optimization Journals, Sites, Societies|