7 Automatic Classification of Persian Texts Employing Keywords
Download citation file:
Deficit of classified texts for different languages especially Persian language is one of the main problems in developing text information retrieval and statistical machine translation systems. A new method of text classification in Persian language will be introduced in this paper. For this purpose, in the first step the keywords of different parts of texts will be extracted employing the classified documents of Wikipedia and through a probabilistic method. Then by employing two methods of machine learning which are based on k_Nearest Neighbor and decision tree algorithms, the texts will be classified in their specialized fields. The evaluation results reveal that this method is appropriate for classification of Persian texts. The results also show that the nearest neighbor method is more applicable to classify texts in comparison with decision tree method.