Skip to Main Content
Skip Nav Destination
ASME Press Select Proceedings
International Conference on Computer Technology and Development, 3rd (ICCTD 2011)
By
Jianhong Zhou
Jianhong Zhou
Search for other works by this author on:
ISBN:
9780791859919
No. of Pages:
2000
Publisher:
ASME Press
Publication date:
2011

The data source selection is one of the most important processes for domain thematic word extraction. Most of the previous work mainly researched on how to the extract keywords from existing corpus with good algorithms. Meanwhile, there is very limited research on how to explore good data sources for text corpus collection. This paper researches on how to use the online web tools to identify high quality data sources. Then, considering the characteristics of subject keywords, we propose an improved TF-IDF weight calculation formula for keywords sorting, and extract the field keywords from the documents by recalculating the weights of candidate words with the improved method. Finally, taking the Chinese herbal medicine field as an example, our result shows that we can have large higher accuracy and higher recall rate at much lower cost with our method given in this paper.

Abstract
Key Words
1. Introduction
2. Basic Ideas
3. The Strategy of Subject Terms Extraction
4. Experimental Setup
5. Concluding Remarks and Further Work
References
This content is only available via PDF.
You do not currently have access to this chapter.
Close Modal

or Create an Account

Close Modal
Close Modal