ASME Press Select Proceedings

International Conference on Instrumentation, Measurement, Circuits and Systems (ICIMCS 2011)

Chen Ming
Chen Ming
ASME Press
Publication date:

This paper discussed the problem in number selection in clustering and variable selection based on behaviors of a huge website in China. By comparing the traditional model based on BIC criterion and the method based on prediction strength, we try to construct a set of general framework of web data cluster analysis by introducing prediction strength. We obtain the following conclusions: In web usage analysis, traditional parametric clustering method based on mixture model fails to solve the central problems in web usage cluster analysis. Prediction strength designed by nonparametric statistics and machine learning is fast and flexible over than model-based BIC criterion in clustering number selection. We find another advantage of prediction strength is its convenience to operate variable selection and number selection. Based on the special properties on web usage data, we present clustering process in application which combining with data cleaning and clustering techniques.

