Skip to Main Content
Skip Nav Destination
ASME Press Select Proceedings
International Conference on Advanced Computer Theory and Engineering (ICACTE 2009)
By
Xie Yi
Xie Yi
Search for other works by this author on:
ISBN:
9780791802977
No. of Pages:
2012
Publisher:
ASME Press
Publication date:
2009

Multilingual corpora are becoming an essential resource for work in multilingual natural language processing. The aim of this paper is to investigate the effects of applying a clustering technique to parallel multilingual texts. It is interesting to look at the differences of the cluster mappings and the tree structures of the clusters. The effect of reducing the set of terms considered in clustering parallel corpora is also studied. After that, a genetic-based algorithm is applied to optimize the weights of terms considered in clustering the texts to classify unseen examples of documents. Specifically, the aim of this work is to introduce the tools necessary for this task and display a set of experimental results and issues which have become apparent.

Abstract
Key Words
1. Introduction
2. Clustering Parallel Corpora
3. Clustering Documents with a Set of Reduced Terms
4. A Semi-Supervised Clustering Based on Reduced Terms
5. Experimental Results
6. Conclusion
References
This content is only available via PDF.
You do not currently have access to this chapter.
Close Modal

or Create an Account

Close Modal
Close Modal