Indexing Genome with the External Construction of Compressed Suffix Tree Using LCP Array

Authors

  • Vijay Kumar Vishwakarma Department of Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh - 473 226, India
  • Abhishek Srivastava Department of Computer Science and Engineering, Jaypee University of Engineering and Technology

DOI:

https://doi.org/10.51983/ajeat-2013.2.1.652

Keywords:

Genome Indexing, Compressed Suffix Tree, Data Structure, DNA Indexing

Abstract

We are proposing the genome indexing algorithm, which depends upon compressed form of suffix trees, in which every node has four parts; suffix array number, suffix start number, LCP count, and a pointer to another node. The proposed algorithm does not use the whole suffix array, it just takes some necessary information like LCP of two suffix array, compare them and suffix start number, to align the suffix to proper position and suffix array number to distinguish among all the partitions. The use of compressed suffix array minimizes the number of trees, eventually; it also minimizes the random access to input data, as it creates the compressed suffix tree for two suffix arrays using pairwise sorting, sequentially.

References

Benjarath Phoophakdee and Mohammed J. Zaki, “Genome-scale disk-based suffix tree indexing”. SIGMOD ‘07: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM. pp. 833–844, 2007.

Marina Barsky, Ulrike Stege, Alex Thomo, and Chris Upton . “A new method for indexing genomes using on-disk suffix trees”. CIKM ‘08: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM. pp. 649–658, 2008.

http://en.wikipedia.org/wiki/Suffix_tree

Donald R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information Coded in Alphanumeric”, Journal of the ACM, Vol. 15, NO 4, pp. 514-534, 1968.

Niko Välimäki, Wolfgang Gerlach, Kashyap Dixit and Veli Mäkinen, “Compressed Suffix Tree - A Basis for Genome-scale Sequence Analysis”. Bioinformatics, 23(5), Application note, pp 629-630, 2007

Simon Gog, Enno Ohlebusch, “Fast and Lightweight LCP-Array Construction Algorithms”, ALENEX, 2011.

N.J. Larsson and K. Sadakane, “Faster suffix sorting”, Tech. Rep. LUCS-TR: 99-214 of the Dept. of Comp. Sc., Lund University, Sweden, 1999.

http://en.wikipedia.org/wiki/Lexicographical_order

Md. Jahangir Alam, Muhammad Monsur Uddin, Mohammad Shabbir Hasan, Abdullah Al Mahmood, “ Pair Wise Sorting: A New Way of Sorting”, International Journal of Computer Science and Information Security, Volume 8:9, pp. 116-120,2010.

Simonas Salteni, “External memory sorting”, Deptt. Of Computer Science, Aalborg University, Denmark, LNCS, pp 1-7, 2001. 11

Downloads

Published

05-05-2013

How to Cite

Vishwakarma, V. K., & Srivastava, A. (2013). Indexing Genome with the External Construction of Compressed Suffix Tree Using LCP Array. Asian Journal of Engineering and Applied Technology, 2(1), 7–11. https://doi.org/10.51983/ajeat-2013.2.1.652