Estimating the Number of Clusters in Cluster Analysis

Show full item record

Title: Estimating the Number of Clusters in Cluster Analysis
Author: Dasah, Julius Berry
Advisors: David Dickey, Committee Member
Leonard Stefanski, Committee Co-Chair
Dennis Boos, Committee Chair
Jason osborne, Committee Member
Abstract: In many applied fields of study such as medicine, psychology, ecology, taxonomy and finance one has to deal with massive amounts of noisy but structured data. A question that often arises in this context is whether or not the observations in these data fall into some "natural" groups, and if so, how many groups? This dissertation proposes a new quantity, called the [it maximal jump function], for assessing the number of groups in a data set. The estimated maximal jump function measures the excess transformed [it distortion] attainable by fitting an extra cluster to a data set. By [it distortion,] we mean the average distance between each observation and its nearest cluster center. [it Distortion] $ d g$ in the above sense, is a measure of the error incurred by fitting $g$ clusters to a data set. Three stopping rules based on the maximal jump function are proposed for determining the number of groups in a data set. A new procedure for clustering data sets with a common covariance structure is also introduced. The proposed methods are tested on a wide variety of real data including DNA microarray data sets as well as on high-dimensional simulated data possessing numerous "noisy" features⁄dimensions. Also, to show the effectiveness of the proposed methods, comparisons are made to some well known clustering methods.
Date: 2007-03-08
Degree: PhD
Discipline: Statistics
URI: http://www.lib.ncsu.edu/resolver/1840.16/4606


Files in this item

Files Size Format View
etd.pdf 1.230Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record