25 Breaking a Hierarchical Clustering Algorithm with an Evolutionary Algorithm
-
Published:2009
Download citation file:
Neighbor joining algorithms are a type of hierarchical clustering algorithm commonly used to construct “family trees” that form a visualization of how data are related to one another. This type of algorithm is fast but known not to be robust to the addition or deletion of single data points. This study uses an evolutionary algorithm to find sets of data for which the algorithm is maximally and minimally stable. A new distance measure between trees is introduced and used to design a fitness function that measures the stability of a neighbor joining algorithm on a data set. This yields a novel bootstrap test for the stability of a data set relative to any given neighbor joining algorithm. It is found that typical data is highly unstable, suggesting that neighbor-joining algorithms should only be used on data where it is natural to suspect an underlying hierarchical structure to data, such as gene sequences of related organisms.