Regression via Clustering using Dirichlet Mixtures
| dc.contributor.advisor | Hao H. Zhang, Committee Member | en_US |
| dc.contributor.advisor | Subhashis Ghosal, Committee Chair | en_US |
| dc.contributor.advisor | John F. Monahan, Committee Member | en_US |
| dc.contributor.advisor | Sujit K. Ghosh, Committee Member | en_US |
| dc.contributor.author | Kang, Changku | en_US |
| dc.date.accessioned | 2010-04-02T18:37:15Z | |
| dc.date.available | 2010-04-02T18:37:15Z | |
| dc.date.issued | 2005-12-06 | en_US |
| dc.degree.discipline | Statistics | en_US |
| dc.degree.level | dissertation | en_US |
| dc.degree.name | PhD | en_US |
| dc.description.abstract | Regression analysis is a fundamental problem of statistics. When the regression function has an unknown form, parametric analysis is sometimes inappropriate. In such a situation, the regression function should be estimated by nonparametric methods. Often, the regressor variable is sampled from several different subpopulations and the regression function has different forms depending on the source. The labels of these source subpopulations are not observable. Although a nonparametrically specified regression function can capture the overall regression function, nonparametric regression estimates are usually dependent on the assumption of homoscedasticity of additive errors. If the underlying distribution of X has unknown clusters, then the usual assumption, the homoscedasity does not hold. In estimating the regression function, we propose the idea of first finding clusters in the regressor variables by the Dirichlet mixture to impute lost subpopulation labels. A standard regression method such as linear or polynomial regression then may be used within each cluster. Markov Chain Monte Carlo (MCMC) sampling method is used to find the clusters and for each sample the estimated regression functions can be obtained. We also apply our method to the large p, small n problem, where the number of variables p is much greater than the number of samples n. In several simulation experiments, our method is compared to other methods such as kernel and smoothing splines in the univariate case and GAM (generalized additive model) and MARS (Multivariate Adaptive Regression Splines) in the multivariate case. The consistency issue is discussed without explicit proof. | en_US |
| dc.identifier.other | etd-11022005-230329 | en_US |
| dc.identifier.uri | http://www.lib.ncsu.edu/resolver/1840.16/3822 | |
| dc.rights | I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. | en_US |
| dc.subject | Bayesian | en_US |
| dc.subject | clustering | en_US |
| dc.subject | Dirichlet mixtures | en_US |
| dc.title | Regression via Clustering using Dirichlet Mixtures | en_US |
Files
Original bundle
1 - 1 of 1
