International Conference on Software Technology and Engineering (ICSTE 2012)
39 An Approach for Text Clustering Using Modified K-means Algorithm
Download citation file:
- Ris (Zotero)
- Reference Manager
With the rapid expansion of internet, the digital data available evolving at a large pace day by day. This led to the need for effective regulation of the data available. The text data stored in digital libraries are in unstructured format. Thus the demand for easy retrieval, accessibility, and organization of text material has become an essential one. Among all the text mining methodologies available, clustering is one of the methods that is used for effective organization of data. In this paper an efficient K-means algorithm for clustering the text data is proposed. In this algorithm, a procedure to select the initial centroids is described. Dissimilar documents are selected as initial centroids for the K-means algorithm. The number of iterations taken to converge is shown to have improved. The experimental results show that the proposed algorithm improve the performance compared to the simple K-means algorithm.