Boosting methods for variable selection in high dimensional sparse models

Show full item record

Title: Boosting methods for variable selection in high dimensional sparse models
Author: Hwang, Wook Yeon
Advisors: Hao Helen Zhang, Committee Member
Howard Bondell, Committee Member
Wenbin Lu, Committee Member
Subhashis Ghosal, Committee Chair
Abstract: Firstly, we propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an extremely sparse high dimensional linear regression model. We exploit the fact that the LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. By carefully considering the relationship between estimators at successive stages, we develop fast algorithms to compute our estimators. The performance of our new estimators is compared with commonly used estimators in terms of predictive accuracy and errors in variable selection. It is observed that our approach has better prediction performance for highly sparse high dimensional linear regression models. Secondly, we propose a new variable selection technique for binary classification in high dimensional models based on a forward selection version of the Squared Support Vector Machines or one-norm Support Vector Machines, to be called as forward iterative selection and classification algorithm (FISCAL). This methods seem to work better for a highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector machines are convex and differentiable except at zero when the predictor is one-dimensional. Then an iterative forward selection approach is applied along with the squared support vector machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for the FISCAL to save computational burdens. We apply the processes to the original onenorm Support Vector Machines. We compare the FISCAL with other widely used binary classification approaches with regard to prediction performance and selection accuracy. The FISCAL shows competitive prediction performance for highly sparse high dimensional binary classification models.
Date: 2009-08-27
Degree: PhD
Discipline: Statistics
URI: http://www.lib.ncsu.edu/resolver/1840.16/4092


Files in this item

Files Size Format View
etd.pdf 454.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record