15 Preprocessing of Web Server Logs from Online Newspaper
Download citation file:
- Ris (Zotero)
- Reference Manager
This paper proposes a new method to perform preprocessing in web usage mining. The data used for this experiment is web server logs from an online newspaper in Malaysia. The preprocessing stage consists of data cleaning and user identification. In this project, Python 2.6 is used as the main language to perform the data cleaning operations. Detailed explanation on data cleaning is illustrated, as well as the steps taken to conduct user identification. The results of data cleaning and user identification based on our experiment are also discussed. The output of this study is a log file which has been cleaned, and can be used in the next stage of web usage mining; which is pattern discovery.