Skip to Main Content
ASME Press Select Proceedings

International Conference on Information Technology and Computer Science, 3rd (ITCS 2011)

Editor
V. E. Muhin
V. E. Muhin
National Technical University of Ukraine
Search for other works by this author on:
W. B. Hu
W. B. Hu
Wuhan University
Search for other works by this author on:
ISBN:
9780791859742
No. of Pages:
656
Publisher:
ASME Press
Publication date:
2011

Nowadays Internet presents a huge amount of information with special formats for users. How to extract information quickly and effectively from various sources becomes very important. This paper investigates a novel approach for extracting data from HTML sites based on studying HTMLParser in depth. With this approach, we can extract hyperlink and other formatted information conveniently, and can translate relevant pieces of HTML pages into XML. Alternatively, we can store these data into SQL database after cleaning the information detail. We also extend HTMLParser to extract custom tags of information for much more applications. Experimental results confirm the feasibility of the approach.

This content is only available via PDF.
Close Modal
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close Modal
Close Modal