87 A Method for Reducing the Number of Training Samples in KNN Text Classification
-
Published:2011
Download citation file:
K-Nearest Neighbor (KNN) algorithm is simple and good at stability that it has been widely used in text classification. But the higher dimensions of document vector and larger size of the text classification sample, it will seriously affect the accuracy and efficiency of classification. For the above shortcomings, a new selection method of samples based on spanning tree document clustering is presented, whose basic idea is that the documents samples in each category have been divided automatic into different clusters based on spanning tree document clustering. Within each category ,there are sub-tree generated which they have the same sub-categories. Each sub-tree is cut based on node density. As reserving typical samples and reducing training samples, the train samples remained have a good representative. Experiments result show that the method not only improves the efficiency of the method of classification, and the classification accuracy has been improved to some extent.