47 Discovery of Useful Concepts Using the Hierarchy of Attributes and Concepts
Download citation file:
- Ris (Zotero)
- Reference Manager
Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This research focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique computational requirements on relevant clustering algorithms. A variety of algorithms have recently emerged that meet these requirements and were successfully applied to real-life data mining problems. Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., Euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with categorical attributes. Instead of using traditional clustering algorithms that use distances between points for clustering which is not an appropriate concept for Boolean and categorical attributes, we propose a novel concept of HAC (Hierarchy of Attributes and Concepts) to measure the similarity∕proximity between a pair of data points. We present a robust clustering algorithm HAC that employs hierarchy of concepts and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert∕similarity table is the only source of knowledge. For data with categorical attributes, our findings indicate that HAC not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.