Estimating the Number of Clusters in Cluster Analysis

Dasah, Julius Berry

Estimating the Number of Clusters in Cluster Analysis

Files

etd.pdf (1.23 MB)

Date

2007-03-08

Authors

Dasah, Julius Berry

Advisors

David Dickey, Committee Member

Leonard Stefanski, Committee Co-Chair

Dennis Boos, Committee Chair

Jason osborne, Committee Member

Abstract

In many applied fields of study such as medicine, psychology, ecology, taxonomy and finance one has to deal with massive amounts of noisy but structured data. A question that often arises in this context is whether or not the observations in these data fall into some "natural" groups, and if so, how many groups? This dissertation proposes a new quantity, called the [it maximal jump function], for assessing the number of groups in a data set. The estimated maximal jump function measures the excess transformed [it distortion] attainable by fitting an extra cluster to a data set. By [it distortion,] we mean the average distance between each observation and its nearest cluster center. [it Distortion] $ d g$ in the above sense, is a measure of the error incurred by fitting $g$ clusters to a data set. Three stopping rules based on the maximal jump function are proposed for determining the number of groups in a data set. A new procedure for clustering data sets with a common covariance structure is also introduced. The proposed methods are tested on a wide variety of real data including DNA microarray data sets as well as on high-dimensional simulated data possessing numerous "noisy" features⁄dimensions. Also, to show the effectiveness of the proposed methods, comparisons are made to some well known clustering methods.

Keywords

High-dimensional Data, Noise Features, Jump Function, Distortion, Cluster Analysis

URI

http://www.lib.ncsu.edu/resolver/1840.16/4606

Degree

PhD

Discipline

Statistics

Collections

Dissertations

Full item page

Estimating the Number of Clusters in Cluster Analysis

Files

Date

Authors

Advisors

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Degree

Discipline

Collections