Skip Nav Destination
ASME Press Select Proceedings
International Conference on Advanced Computer Theory and Engineering, 5th (ICACTE 2012)
Editor
ISBN:
9780791860045
No. of Pages:
938
Publisher:
ASME Press
Publication date:
2012
eBook Chapter
76 Data Preprocessing in Web Text Mining
By
Page Count:
10
-
Published:2012
Citation
Jiang, Y. "Data Preprocessing in Web Text Mining." International Conference on Advanced Computer Theory and Engineering, 5th (ICACTE 2012). Ed. Yi, X. ASME Press, 2012.
Download citation file:
The development of highly efficient and effective search engines is accelerated by the abundant WWW information and people’s need for high quality information. Web text mining is one of the key techniques for search engines. But Web data is much complex which enlarges the difficulty in web text mining. To get good mining results, Web page preprocessing is necessary before any text mining starting. Here given the pages set collected from the Robot of search engines, we discussed some essential work to present pages in vectors, such as the term selection, weights presentation, etc. The purpose is to make preparation for the following Web text mining task.
Topics:
Text analytics
1. Introduction
2. Web Mining
3. Web Data Preprocessing
4. Conclusions
References
This content is only available via PDF.
You do not currently have access to this chapter.
Email alerts
Related Chapters
Part of Speech Tagging
International Conference on Computer and Automation Engineering, 4th (ICCAE 2012)
Research on Internet Chinese News Geography Name Text Mining
Proceedings of the International Conference on Technology Management and Innovation
Comparative Study of Text Representation Methods
International Conference on Information Technology and Computer Science, 3rd (ITCS 2011)
Topographic Processing of Very Large Text Datasets
Intelligent Engineering Systems through Artificial Neural Networks Volume 18
Related Articles
A Text Analytics Framework for Supplier Capability Scoring Supported by Normalized Google Distance and Semantic Similarity Measurement Methods
J. Comput. Inf. Sci. Eng (October,2023)
A Framework Based on K-Means Clustering and Topic Modeling for Analyzing Unstructured Manufacturing Capability Data
J. Comput. Inf. Sci. Eng (February,2020)
A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval
J. Mech. Des (November,2017)