Skip to Main Content
ASME Press Select Proceedings

International Conference on Information Technology and Computer Science, 3rd (ITCS 2011)

Editor
V. E. Muhin
V. E. Muhin
National Technical University of Ukraine
Search for other works by this author on:
,
W. B. Hu
W. B. Hu
Wuhan University
Search for other works by this author on:
ISBN:
9780791859742
No. of Pages:
656
Publisher:
ASME Press
Publication date:
2011

When internet users are facing a great many search results, document clustering techniques are very helpful. Most of these techniques rely on statistical proximity or dependency between single terms of the documents. Since the phrases can typically represent the concepts expressed in text more accurately than single terms, higher clustering accuracy can be achieved using a phrase-based document similarity measure. A phrase-based hierarchical clustering method for clustering search engine results is presented in this paper. This method mainly consists of a phrase-based document similarity measure and an improved hierarchical clustering algorithm. The document similarity measure is motivated by a measure of semantic relatedness, i.e. the Extended Gloss Overlaps Measure. The measure extracts matching phrases using a novel phrases-based document index model, namely the Document Index Graph (DIG). To emphasize the effect of these phrases, it assigns each matching phrase a much greater score than the summation of scores assigned to its constituent terms. Then an improved hierarchical clustering algorithm (IHCA) is proposed to cluster search results. It seeks and merges eligible mutual nearest neighbor pairs at each hierarchy. When the state of mutual nearest neighbor pairs is stable, the intermediate results are clustered sequentially.

This content is only available via PDF.
Close Modal
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close Modal
Close Modal