Boosting methods for variable selection in high dimensional sparse models
dc.contributor.advisor | Hao Helen Zhang, Committee Member | en_US |
dc.contributor.advisor | Howard Bondell, Committee Member | en_US |
dc.contributor.advisor | Wenbin Lu, Committee Member | en_US |
dc.contributor.advisor | Subhashis Ghosal, Committee Chair | en_US |
dc.contributor.author | Hwang, Wook Yeon | en_US |
dc.date.accessioned | 2010-04-02T18:44:28Z | |
dc.date.available | 2010-04-02T18:44:28Z | |
dc.date.issued | 2009-08-27 | en_US |
dc.degree.discipline | Statistics | en_US |
dc.degree.level | dissertation | en_US |
dc.degree.name | PhD | en_US |
dc.description.abstract | Firstly, we propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an extremely sparse high dimensional linear regression model. We exploit the fact that the LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. By carefully considering the relationship between estimators at successive stages, we develop fast algorithms to compute our estimators. The performance of our new estimators is compared with commonly used estimators in terms of predictive accuracy and errors in variable selection. It is observed that our approach has better prediction performance for highly sparse high dimensional linear regression models. Secondly, we propose a new variable selection technique for binary classification in high dimensional models based on a forward selection version of the Squared Support Vector Machines or one-norm Support Vector Machines, to be called as forward iterative selection and classification algorithm (FISCAL). This methods seem to work better for a highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector machines are convex and differentiable except at zero when the predictor is one-dimensional. Then an iterative forward selection approach is applied along with the squared support vector machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for the FISCAL to save computational burdens. We apply the processes to the original onenorm Support Vector Machines. We compare the FISCAL with other widely used binary classification approaches with regard to prediction performance and selection accuracy. The FISCAL shows competitive prediction performance for highly sparse high dimensional binary classification models. | en_US |
dc.identifier.other | etd-08172009-160237 | en_US |
dc.identifier.uri | http://www.lib.ncsu.edu/resolver/1840.16/4092 | |
dc.rights | I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. | en_US |
dc.subject | regression | en_US |
dc.subject | high dimensional sparse models | en_US |
dc.subject | variable selection | en_US |
dc.subject | binary classification | en_US |
dc.subject | boosting | en_US |
dc.subject | lasso | en_US |
dc.subject | elastic net | en_US |
dc.subject | support vector machines | en_US |
dc.subject | gene expression data | en_US |
dc.title | Boosting methods for variable selection in high dimensional sparse models | en_US |
Files
Original bundle
1 - 1 of 1