Issue
Korean Journal of Chemical Engineering,
Vol.22, No.3, 345-352, 2005
A Gene Clustering Method with Masking Cross-matching Fragments Using Modified Suffix Tree Clustering Method
Multiple sequence alignment is a method for comparing two or more DNA or protein sequences. Most multiple sequence alignment methods rely on pairwise alignment and Smith-Waterman algorithm [Needleman and Wunsch, 1970; Smith and Waterman, 1981] to generate an alignment hierarchy. Therefore, as the number of sequences increases, the runtime increases exponentially. To resolve this problem, this paper presents a multiple sequence alignment method using a parallel processing suffix tree algorithm to search for common subsequences at one time without pairwise alignment. The cross-matched subsequences among the searched common subsequences may be generated and those cause inexact-matching. So the procedure of masking cross-matching pairs was suggested in this study. The proposed method, improved STC (Suffix Tree Clustering), is summarized as follows: (1) construction of suffix tree; (2) search and overlap of common subsequences; (3) grouping of subsequence pairs; (4) masking of cross-matching pairs; and (5) clustering of gene sequences. The new method was successfully evaluated with 23 genes in Mus musculus and 22 genes in three species, clustering nine and eight clusters, respectively.
[References]
  1. Chen JY, Carlis JV, Inf. Syst., 28, 287, 2003
  2. Choi SH, Manousiouthakis V, Korean J. Chem. Eng., 19(2), 227, 2002
  3. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL, Nucleic Acids Res., 27(11), 2369, 1999
  4. Delcher AL, Phillippy A, Carlton J, Salzberg SL, Nucleic Acids Res., 30(11), 2478, 2002
  5. Gusfield D, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, London, 1997
  6. Higgins DG, Thompson JD, Gibson TJ, Methods Enzymol., 266, 383, 1996
  7. Higgins DG, Sharp PM, Gene, 73, 237, 1988
  8. Hon WK, Sadakane K, "Space-Economical Algorithms for Finding Maximal Unique Matches", Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching, 144, 2002
  9. Kalyanaraman A, Aluru S, Kothari S, Parallel EST Clustering, HICOMB 2002, 185, 2002
  10. Kim DK, Lee KS, Yang DR, Korean J. Chem. Eng., 21(5), 942, 2004
  11. Lee JM, Lee JH, Korean J. Chem. Eng., 21(2), 338, 2004
  12. McCreight E, J. ACM, 23, 262, 1976
  13. Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA, Genome Res., 9, 1143, 1999
  14. Morgenstem B, Frech K, Dress A, Wemer T, Bioinformatics, 14, 290, 1998
  15. Mount DW, Bioinformatics : Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, 2001
  16. Needleman SB, Wunsch CD, J. Mol. Biol., 48, 443, 1970
  17. Notredame C, Higgins DG, Nucleic Acids Res., 24, 1515, 1996
  18. Ostell JM, Wheelan SJ, Kans JA, Methods Biochem. Anal., 43, 19, 2001
  19. Pearson WR, Miller W, Methods Enzymol., 210, 575, 1992
  20. Phillips A, Janies D, Wheeler W, Molecular Phylogenetics and Evolution, 16, 317, 2000
  21. Randal LS, Christiansen T, Learning Perl, Second Edition, O'Reilly, 1997
  22. Salzberg SL, Searls DB, Kasif S, Trends Guide to Bioinformatics, Elsevier Science, 1998
  23. Shin PK, Koo JH, Lee WJ, Seo JH, Korean J. Chem. Eng., 13(1), 82, 1996
  24. Smith TF, Waterman MS, J. Mol. Biol., 197, 723, 1981
  25. Thompson JD, Higgins DG, Gibson TJ, Nucleic Acids Res., 22, 4673, 1994
  26. Tisdall JD, Beginning Perl for Bioinformatics, O'REILLY, 2001
  27. Ukkonen E, Algorithmica, 14, 249, 1995
  28. Volfovsky N, Haas BJ, Salzberg SL, Genome Biology, 2, 1, 2001
  29. Weiner P, "Linear Pattern Matching Algorithms", In Proc. of the 14th IEEE Annual Symposium on Switching and Automata Theory, 1, 1973
  30. Zamir O, Etzioni O, Madani O, Karp RM, "Fast and Intuitive Clustering of Web Documents", In Proc. of the 3nd International Conference on Knowledge Discovery and Data Mining, 287, 1997
  31. Zamir O , Etzioni O, "Web Document Clustering: A Feasibility Demonstration", In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 46, 1998