Variable Selection in Multiclass Support Vector Machine and Applications in Genomic Data Analysis

dc.contributor.advisorDr. Zhao-Bang Zeng, Committee Chairen_US
dc.contributor.advisorDr. Hao Helen Zhang, Committee Co-Chairen_US
dc.contributor.authorHuang, Lingkangen_US
dc.date.accessioned2010-04-02T19:10:06Z
dc.date.available2010-04-02T19:10:06Z
dc.date.issued2009-03-04en_US
dc.degree.disciplineBioinformaticsen_US
dc.degree.disciplineStatisticsen_US
dc.degree.leveldissertationen_US
dc.degree.namePhDen_US
dc.description.abstractMicroarray techniques provide new insights into cancer diagnosis using gene expression profiles. Molecular diagnosis based on high-throughput genomic data sets presents major challenge due to the overwhelming number of variables and complex multi-class nature of tumor samples. In this thesis, the author first tackled a multi-class problem related to liver toxicity severity prediction using the Random Forest and GEMS-SVM (Gene Expression Model Selector using Support Vector Machine). However, the original SVM regularization formulation does not accommodate the variable selection. Most existing approaches, including GEMS-SVM, handle this issue by selecting genes prior to classification, which does not consider the correlation among genes since they are selected by univariate ranking. In this thesis, the author developed new multi-class SVM (MSVM) approaches which can perform multi-class classification and variable selection simultaneously and learn optimal classifiers by considering all classes and all genes at the same time. The original multi-class SVM proposed by Crammer and Singer (2001) does not perform the variable selection. By using the MSVM loss function proposed by Crammer and Singer (2001), the author developed new variable selection approaches for both linear and non-linear classification problems. For linear classification problems, four different sparse regularization terms were included in the objective function respectively. For nonlinear classification problems, two different approaches have been developed to tackle them. The first approach was used in non-linear MSVMs via basis function transformation. The second approach was used in non-linear MSVMs via kernel functions. The newly developed methods were applied to both simulation and real data sets. The results demonstrated that our methods could select a much smaller number of genes, compared with other existing methods, with high classification accuracy to predict the tumor subtypes. The combination of high accuracy and small number of genes makes our new methods as powerful tools for disease diagnostics based on expression data and target identifications of the therapeutic intervention.en_US
dc.identifier.otheretd-02262008-213801en_US
dc.identifier.urihttp://www.lib.ncsu.edu/resolver/1840.16/5240
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectmulti-class classificationen_US
dc.subjectsupport vector machineen_US
dc.subjectmicroarrayen_US
dc.subjectvariable selectionen_US
dc.titleVariable Selection in Multiclass Support Vector Machine and Applications in Genomic Data Analysisen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
etd.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format

Collections