Pairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array

Publication Date

2015-07-31

Country of Publication

Jordan

No. of Pages

Main Subjects

Media and Communication

Topics

Data mining

Abstract EN

Sequence alignment is a bioinformatics application that determines the degree of similarity between nucleotide sequences which is assumed to have same ancestral relationships.

This sequence alignment method reads query sequence from the user and makes an alignment against large and genomic sequence data sets and locate targets that are similar to an input query sequence.

Existing accurate algorithm, such as Smith-Waterman and FASTA are computationally very expensive, which limits their use in practice.

The existing search tools, such as BLAST and WU-BLAST, employ heuristics to improve the speed of such searches.

However, such heuristics can sometimes miss targets, in which many cases are undesirable.

Considering the rapid growth of database sizes, this problem demands ever-growing computation resources, and remains as a computational challenge.

Most common sequence alignment algorithms like BLAST, WU-BLAST, and SCT searches a given query sequence against set of database sequences.

In this paper BioDBMPHF Tool has been developed to find pair wise local sequence alignment by preprocessing the database.

Preprocessing is done by means of finding Longest Common Substring (LCS) from the database of sequences that have the highest local similarity with a given query sequence and reduces the size of the database based on frequent common subsequence.

In this BioDBMPHF Tool fine-tuned enhanced suffix array is constructed and used to find LCS.

Experimental results show that HashIndexalgorithm reduces the time and space complexity to access LCS.

Time complexity to find LCS of the HashIndexalgorithm is O (2 + γ) where ‘γ’ is the time taken to access the pattern.

Space complexity of fine-tuned enhanced suffix array is 5n bytes per character for reduced enhanced Lcp table and to store bucket table it requires 32 bytes.

Data mining technique is used to cross validate the result.

It is proved that the developed BioDBMPHF Tool effectively compresses the database and obtains same results compared to that traditional algorithm in approximately half the time taken by them thereby reducing the time complexity.

American Psychological Association (APA)

Kunthavai A.& Vasantharathna S.& Thirumurugan S.. 2015. Pairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array. The International Arab Journal of Information Technology،Vol. 12, no. 4.
https://search.emarefa.net/detail/BIM-431223

Modern Language Association (MLA)

Kunthavai A.…[et al.]. Pairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array. The International Arab Journal of Information Technology Vol. 12, no. 4 (Jul. 2015).
https://search.emarefa.net/detail/BIM-431223

American Medical Association (AMA)

Kunthavai A.& Vasantharathna S.& Thirumurugan S.. Pairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array. The International Arab Journal of Information Technology. 2015. Vol. 12, no. 4.
https://search.emarefa.net/detail/BIM-431223

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-431223

SaveSaved Print

Arab Citation & Impact Factor "Arcif"

Largest Arabic Database of Citations Analysis for the Arabic Scholarly Journals Issued in Arab World.

eMarefa Indicators
for Arab Scientific Production

"Kashif" for Checking Similarity or Plagiarism in the Arabic Researches. know more