International Conference on Software Technology and Engineering (ICSTE 2012)
46 The Use of Hash Table for Building the Distance Matrix in a Pair-Wise Sequence Alignment
Download citation file:
- Ris (Zotero)
- Reference Manager
In bioinformatics, distance matrices are used for many purposes, such as clustering sequences, representing protein structures without relying on coordinates, constructing phylogenetic trees, and building multiple sequence alignments. The pair-wise alignment plays a significant role in the construction of distance matrices because it rates the similarities and distances between the sequences. The N-Gram-Hirschberg (NGH) algorithm is a fast, dynamic-programming pair-wise alignment algorithm, which produces the same optimal results as the Smith-Waterman algorithm. In this paper, we present Hash Table-N-Gram-Hirschberg (HT-NGH) method, a new and practical method for constructing a distance matrix using a pair-wise alignment. HT-NGH uses the hash table capabilities to enhance the transformation process of the two former methods, NGH and Hashing-N-Gram-Hirschberg (H-NGH). The proposed enhancement demonstrates an improvement in time and outperforms H-NGH, without sacrificing space complexity. Furthermore, our algorithm run-time outperforms the NGH and H-NGH methods by 60% and 30%, respectively. In addition, the transformation phase complexity of HT-NGH algorithm is O (min (NM)/w) compared to O (min (NM)) for NGH.