Skip to Main Content
ASME Press Select Proceedings

International Conference on Software Technology and Engineering (ICSTE 2012)

Jianhong Zhou
Jianhong Zhou
Search for other works by this author on:
No. of Pages:
ASME Press
Publication date:

With the rapid expansion of internet, the digital data available evolving at a large pace day by day. This led to the need for effective regulation of the data available. The text data stored in digital libraries are in unstructured format. Thus the demand for easy retrieval, accessibility, and organization of text material has become an essential one. Among all the text mining methodologies available, clustering is one of the methods that is used for effective organization of data. In this paper an efficient K-means algorithm for clustering the text data is proposed. In this algorithm, a procedure to select the initial centroids is described. Dissimilar documents are selected as initial centroids for the K-means algorithm. The number of iterations taken to converge is shown to have improved. The experimental results show that the proposed algorithm improve the performance compared to the simple K-means algorithm.

This content is only available via PDF.
Close Modal
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close Modal
Close Modal