Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
Joint Authors
Amorim, António
Soares, Inês
Goios, Ana
Source
Issue
Vol. 2012, Issue 2012 (31 Dec. 2012), pp.1-4, 4 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2012-09-10
Country of Publication
Egypt
No. of Pages
4
Main Subjects
Natural & Life Sciences (Multidisciplinary)
Medicine
Information Technology and Computer Science
Abstract EN
The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions).
In such cases, an alternative alignment-free method would prove valuable.
Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time.
Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated.
Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph.
We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks.
Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes.
The algorithm was implemented in Python language and is freely available on the web.
American Psychological Association (APA)
Soares, Inês& Goios, Ana& Amorim, António. 2012. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency. The Scientific World Journal،Vol. 2012, no. 2012, pp.1-4.
https://search.emarefa.net/detail/BIM-472498
Modern Language Association (MLA)
Soares, Inês…[et al.]. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency. The Scientific World Journal No. 2012 (2012), pp.1-4.
https://search.emarefa.net/detail/BIM-472498
American Medical Association (AMA)
Soares, Inês& Goios, Ana& Amorim, António. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency. The Scientific World Journal. 2012. Vol. 2012, no. 2012, pp.1-4.
https://search.emarefa.net/detail/BIM-472498
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-472498