Generalizations and Unification of Centroid-based Clustering Methods

No Thumbnail Available

Date

2004-12-01

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

There are many clustering methods that are referred to as k-means-like. We give the minimal necessary and sufficient components for the mechanism of the k-means (iterative and partitional) clustering method of a finite set of objects, X. Thus k-means is generalized and the methods that mimic k-means are unified. We name these k-center clustering methods. The fundamental mechanism of k-center methods exposes the usual misconceptions of k-means such as (a) "distance" satisfies some of properties of a mathematical metric, (b) there is a need to measure "distance" between objects in X, and (c) the centers of clusters have the same nature as the objects of X. Moreover, k-center methods have a common formula to choose or calculate centers of clusters. We characterize the convergent common objective function by expressing it in terms of (a) a distance measure for closeness between center objects and the objects in X and (b) the coherence of clusters. We give a three object example to demonstrate the components of the formal mechanism of a k-center method. We then give examples of various known methods that belong to the class of k-center methods. We exhibit an extensive and thorough comparison of the qualitative k-modes and the numerical spherical k-means. Included are paradigm applications, a matrix environment, an understanding of the duality of a dissimilarity and similarity measure, and an understanding of normalized X and the normalized centers of subsets of X.

Description

Keywords

k-means, data mining, cluster analysis

Citation

Degree

MS

Discipline

Computer Science

Collections