International Conference on Instrumentation, Measurement, Circuits and Systems (ICIMCS 2011)
324 Web Usage Cluster Analysis Based on Prediction Strength
Download citation file:
- Ris (Zotero)
- Reference Manager
This paper discussed the problem in number selection in clustering and variable selection based on behaviors of a huge website in China. By comparing the traditional model based on BIC criterion and the method based on prediction strength, we try to construct a set of general framework of web data cluster analysis by introducing prediction strength. We obtain the following conclusions: In web usage analysis, traditional parametric clustering method based on mixture model fails to solve the central problems in web usage cluster analysis. Prediction strength designed by nonparametric statistics and machine learning is fast and flexible over than model-based BIC criterion in clustering number selection. We find another advantage of prediction strength is its convenience to operate variable selection and number selection. Based on the special properties on web usage data, we present clustering process in application which combining with data cleaning and clustering techniques.