172 Analyzing Large Data Clusters Evolving in Time by a Parallel SOM Based Approach
Download citation file:
Datasets are not always time independent and consequently both information retrieval and data mining should be rethought, e.g., to not report to the users the data clusters that may be obsolete in a near future. Also, discovering how a cluster evolves may help the users in studying the dynamics of complex phenomena. Therefore, aim of the paper is to propose a method to study time-evolving clusters from how the density of the data vectors belonging to each cluster changes in time. Data clusters and their evolution are visualized by tagging the data items, represented in their original space, with the clusters to which they belong to. The paper shows how the user should organize the data items along two dimensions suitable for the problem at hand if the dataset does not have a physical nature or if the physical localization of the items is not relevant for the study. A Parallel SOM is used to study the evolution of clusters derived from massive datasets. Some examples illustrate the proposed methodology.