Boosting methods for variable selection in high dimensional sparse models

dc.contributor.advisorHao Helen Zhang, Committee Memberen_US
dc.contributor.advisorHoward Bondell, Committee Memberen_US
dc.contributor.advisorWenbin Lu, Committee Memberen_US
dc.contributor.advisorSubhashis Ghosal, Committee Chairen_US
dc.contributor.authorHwang, Wook Yeonen_US
dc.date.accessioned2010-04-02T18:44:28Z
dc.date.available2010-04-02T18:44:28Z
dc.date.issued2009-08-27en_US
dc.degree.disciplineStatisticsen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractFirstly, we propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an extremely sparse high dimensional linear regression model. We exploit the fact that the LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. By carefully considering the relationship between estimators at successive stages, we develop fast algorithms to compute our estimators. The performance of our new estimators is compared with commonly used estimators in terms of predictive accuracy and errors in variable selection. It is observed that our approach has better prediction performance for highly sparse high dimensional linear regression models. Secondly, we propose a new variable selection technique for binary classification in high dimensional models based on a forward selection version of the Squared Support Vector Machines or one-norm Support Vector Machines, to be called as forward iterative selection and classification algorithm (FISCAL). This methods seem to work better for a highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector machines are convex and differentiable except at zero when the predictor is one-dimensional. Then an iterative forward selection approach is applied along with the squared support vector machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for the FISCAL to save computational burdens. We apply the processes to the original onenorm Support Vector Machines. We compare the FISCAL with other widely used binary classification approaches with regard to prediction performance and selection accuracy. The FISCAL shows competitive prediction performance for highly sparse high dimensional binary classification models.en_US
dc.identifier.otheretd-08172009-160237en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/4092
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectregressionen_US
dc.subjecthigh dimensional sparse modelsen_US
dc.subjectvariable selectionen_US
dc.subjectbinary classificationen_US
dc.subjectboostingen_US
dc.subjectlassoen_US
dc.subjectelastic neten_US
dc.subjectsupport vector machinesen_US
dc.subjectgene expression dataen_US
dc.titleBoosting methods for variable selection in high dimensional sparse modelsen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
454.08 KB
Format:
Adobe Portable Document Format

Collections