Regression via Clustering using Dirichlet Mixtures

Show full item record

Title: Regression via Clustering using Dirichlet Mixtures
Author: Kang, Changku
Advisors: Hao H. Zhang, Committee Member
Subhashis Ghosal, Committee Chair
John F. Monahan, Committee Member
Sujit K. Ghosh, Committee Member
Abstract: Regression analysis is a fundamental problem of statistics. When the regression function has an unknown form, parametric analysis is sometimes inappropriate. In such a situation, the regression function should be estimated by nonparametric methods. Often, the regressor variable is sampled from several different subpopulations and the regression function has different forms depending on the source. The labels of these source subpopulations are not observable. Although a nonparametrically specified regression function can capture the overall regression function, nonparametric regression estimates are usually dependent on the assumption of homoscedasticity of additive errors. If the underlying distribution of X has unknown clusters, then the usual assumption, the homoscedasity does not hold. In estimating the regression function, we propose the idea of first finding clusters in the regressor variables by the Dirichlet mixture to impute lost subpopulation labels. A standard regression method such as linear or polynomial regression then may be used within each cluster. Markov Chain Monte Carlo (MCMC) sampling method is used to find the clusters and for each sample the estimated regression functions can be obtained. We also apply our method to the large p, small n problem, where the number of variables p is much greater than the number of samples n. In several simulation experiments, our method is compared to other methods such as kernel and smoothing splines in the univariate case and GAM (generalized additive model) and MARS (Multivariate Adaptive Regression Splines) in the multivariate case. The consistency issue is discussed without explicit proof.
Date: 2005-12-06
Degree: PhD
Discipline: Statistics

Files in this item

Files Size Format View
etd.pdf 524.7Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record