60 K-Models Clustering, a Generalization of K-Means Clustering Available to Purchase
-
Published:2010
Download citation file:
K-means operates by selecting initial cluster centers and then iteratively assigning points to clusters base on the proximate cluster center and updating cluster centers. If we regard finding good cluster centers as a statistical parameter estimation problem then estimating the parameters of other statistical models yields a space of novel clustering methods. In this paper we prototype the idea using least squares fit of a line to members of a data partition in place of estimation of cluster centers. The method can accurately reconstruct lines used to generate a given data set. The sum-of-squared-error statistic is an excellent quality measure for detecting discovery of a superior model. Data sets without a linear structure show that the statistical model or models used must be adapted to the data.