Title
The fragment assembly string graph
Abstract
We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes. Contact: gene@eecs.berkeley.edu
Year
DOI
Venue
2005
10.1093/bioinformatics/bti1114
ECCB/JBI
Keywords
Field
DocType
de bruijn graph,dna sequence
Shotgun sequencing,Transitive reduction,Computer science,Spacetime,Algorithm,Theoretical computer science,String graph,De Bruijn graph,Formalism (philosophy),Bioinformatics,Scalability
Conference
Volume
Issue
ISSN
21
2
1367-4803
Citations 
PageRank 
References 
104
8.52
5
Authors
1
Search Limit
100104
Name
Order
Citations
PageRank
Eugene Myers13164496.92