Spaced Seed Data Structures for De Novo Assembly

المؤلفون المشاركون

Birol, Inanç
Chu, Justin
Mohamadi, Hamid
Jackman, Shaun D.
Raghavan, Karthika
Vandervalk, Benjamin P.
Raymond, Anthony
Warren, René L.

المصدر

International Journal of Genomics

العدد

المجلد 2015، العدد 2015 (31 ديسمبر/كانون الأول 2015)، ص ص. 1-8، 8ص.

الناشر

Hindawi Publishing Corporation

تاريخ النشر

2015-10-11

دولة النشر

مصر

عدد الصفحات

8

التخصصات الرئيسية

الأحياء

الملخص EN

De novo assembly of the genome of a species is essential in the absence of a reference genome sequence.

Many scalable assembly algorithms use the de Bruijn graph (DBG) paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences.

Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads.

Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences.

These data structures address memory and run time constraints imposed by longer reads.

We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length.

Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors.

Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds.

These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Birol, Inanç& Chu, Justin& Mohamadi, Hamid& Jackman, Shaun D.& Raghavan, Karthika& Vandervalk, Benjamin P.…[et al.]. 2015. Spaced Seed Data Structures for De Novo Assembly. International Journal of Genomics،Vol. 2015, no. 2015, pp.1-8.
https://search.emarefa.net/detail/BIM-1065994

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Birol, Inanç…[et al.]. Spaced Seed Data Structures for De Novo Assembly. International Journal of Genomics No. 2015 (2015), pp.1-8.
https://search.emarefa.net/detail/BIM-1065994

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Birol, Inanç& Chu, Justin& Mohamadi, Hamid& Jackman, Shaun D.& Raghavan, Karthika& Vandervalk, Benjamin P.…[et al.]. Spaced Seed Data Structures for De Novo Assembly. International Journal of Genomics. 2015. Vol. 2015, no. 2015, pp.1-8.
https://search.emarefa.net/detail/BIM-1065994

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-1065994