ASME Press Select Proceedings
International Conference on Advanced Computer Theory and Engineering, 5th (ICACTE 2012)
Xie Yi
Xie Yi
The development of highly efficient and effective search engines is accelerated by the abundant WWW information and people’s need for high quality information. Web text mining is one of the key techniques for search engines. But Web data is much complex which enlarges the difficulty in web text mining. To get good mining results, Web page preprocessing is necessary before any text mining starting. Here given the pages set collected from the Robot of search engines, we discussed some essential work to present pages in vectors, such as the term selection, weights presentation, etc. The purpose is to make preparation for the following Web text mining task.

