International Conference on Software Technology and Engineering (ICSTE 2012)
46 The Use of Hash Table for Building the Distance Matrix in a Pair-Wise Sequence Alignment
Download citation file:
In bioinformatics, distance matrices are used for many purposes, such as clustering sequences, representing protein structures without relying on coordinates, constructing phylogenetic trees, and building multiple sequence alignments. The pair-wise alignment plays a significant role in the construction of distance matrices because it rates the similarities and distances between the sequences. The N-Gram-Hirschberg (NGH) algorithm is a fast, dynamic-programming pair-wise alignment algorithm, which produces the same optimal results as the Smith-Waterman algorithm. In this paper, we present Hash Table-N-Gram-Hirschberg (HT-NGH) method, a new and practical method for constructing a distance matrix using a pair-wise alignment. HT-NGH uses the hash table capabilities to enhance the transformation process of the two former methods, NGH and Hashing-N-Gram-Hirschberg (H-NGH). The proposed enhancement demonstrates an improvement in time and outperforms H-NGH, without sacrificing space complexity. Furthermore, our algorithm run-time outperforms the NGH and H-NGH methods by 60% and 30%, respectively. In addition, the transformation phase complexity of HT-NGH algorithm is O (min (NM)/w) compared to O (min (NM)) for NGH.