Title
Poster: Scaffolding draft genomes using paired sequencing data
Abstract
The number of sequenced genomes is growing very quickly due to the low cost and availability of high throughput DNA sequencing platforms. However, most genome sequences are not complete, consisting of large numbers of contigs separated by gaps. The process of orienting and ordering these contigs, typically using pairs of reads with approximately known distance in the genome, is known as scaffolding. Scaffolding algorithms were first introduced along with the first genome assemblers. Much like the assemblers they were designed to work with pairs of relatively long Sanger reads. The length of these reads ensures that the majority of them would map correctly onto contigs. Current sequencing platforms generate hundreds of millions of much shorter reads in each experiment. The shortness of the reads causes a large amount of non-unique and incorrect mapping. This poster presents an ongoing work on designing a scaffolding strategy appropriate for such type of data. The algorithm can scaffold contigs using paired-end or mate pair sequencing data from multiple platforms. The reads must first be mapped against the contigs, using any tool that reports multiple alignments for each read and can generate SAM output. Read pairs containing at least a read that is not uniquely mapped are removed from consideration. Contigs are annotated using RepeatMasker and RepeatModeler, and read pairs are removed for which at least one read maps within an annotated repeat. Finally, read pairs consisting of reads that map in two different contigs are removed if the minimum insert size implied by the mapping is longer than the expected insert size by more than 3 standard deviations.
Year
DOI
Venue
2011
10.1109/ICCABS.2011.5729905
Computational Advances in Bio and Medical Sciences
Keywords
Field
DocType
genome sequence,genome assembler,scaffolding algorithm,scaffolding strategy,high throughput dna,current sequencing platform,large number,incorrect mapping,large amount,ongoing work,scaffolding draft,integer linear programming,genomics,gallium,algorithm design and analysis,algorithm design,availability,molecular biophysics
Genome,Assemblers,High-Throughput DNA Sequencing,Hybrid genome assembly,Deep sequencing,Biology,Genomics,Contig,Bioinformatics,Genetics
Conference
ISBN
Citations 
PageRank 
978-1-61284-851-8
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
J. Lindsay100.34
J. Zhang200.34
T. Farnham300.34
Y. Wu41178139.36
I. Mandoiu5132.05
R. O'Neill600.34
H. Salooti700.68
E. Bullwinkel800.34
A. Zelikovsky928938.30