Boosting methods for variable selection in high dimensional sparse models
No Thumbnail Available
Files
Date
2009-08-27
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
Firstly, we propose new variable selection techniques for regression in high dimensional
linear models based on a forward selection version of the LASSO, adaptive LASSO or
elastic net, respectively to be called as forward iterative regression and shrinkage technique
(FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an
extremely sparse high dimensional linear regression model. We exploit the fact that the
LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is
one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until
convergence occurs. By carefully considering the relationship between estimators at successive
stages, we develop fast algorithms to compute our estimators. The performance of our
new estimators is compared with commonly used estimators in terms of predictive accuracy
and errors in variable selection. It is observed that our approach has better prediction
performance for highly sparse high dimensional linear regression models.
Secondly, we propose a new variable selection technique for binary classification
in high dimensional models based on a forward selection version of the Squared Support
Vector Machines or one-norm Support Vector Machines, to be called as forward iterative
selection and classification algorithm (FISCAL). This methods seem to work better for a
highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector
machines are convex and differentiable except at zero when the predictor is one-dimensional.
Then an iterative forward selection approach is applied along with the squared support vector
machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for
the FISCAL to save computational burdens. We apply the processes to the original onenorm
Support Vector Machines. We compare the FISCAL with other widely used binary
classification approaches with regard to prediction performance and selection accuracy.
The FISCAL shows competitive prediction performance for highly sparse high dimensional binary classification models.
Description
Keywords
regression, high dimensional sparse models, variable selection, binary classification, boosting, lasso, elastic net, support vector machines, gene expression data
Citation
Degree
PhD
Discipline
Statistics