International Conference on Instrumentation, Measurement, Circuits and Systems (ICIMCS 2011)
309 Evaluating Semantic Relatedness Using Wikipedia-Based Representative Features Analysis
Download citation file:
In order to evaluate semantic relatedness of natural language concepts automatically, we propose Representative Features Analysis (RFA), a novel approach that represents the meaning of concepts in a high-dimensional space of representative features as a semantic-surrounding concept vector. The vector elements are weighted by the combination of TF-IDF scheme and the link status of Concept Interpreting Network in which nodes represent the concepts and edges represent the interpreting relation between concepts. Assessing the relatedness amounts to comparing the corresponding vectors using conventional metrics. Compared with the previous state of the art, using RFA results in substantial improvements in correlation of computed relatedness scores with human judgments: from to 0.78 for concepts and performs better in recalling the top n relevant concepts than ESA method. Importantly, the RFA model could evaluate semantic similarity for concepts with low occurrence in Wikipeida articles and eliminate the negative effect caused by the meaningless occurrence of words in the Wikipedia articles, which the approach of ESA neglects.