Title
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.
Abstract
Motivation: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 30 ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task. Result: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30x simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.
Year
DOI
Venue
2012
10.1093/bioinformatics/bts563
BIOINFORMATICS
Field
DocType
Volume
Genome,Data mining,Hybrid genome assembly,Computer science,Genomics,Contig,Contig Mapping,Bioinformatics,Sequence assembly,k-mer
Journal
28
Issue
ISSN
Citations 
22
1367-4803
12
PageRank 
References 
Authors
1.04
3
11
Name
Order
Citations
PageRank
Binghang Liu1453.43
Jianying Yuan2453.43
Siu-ming Yiu3102692.90
Zhenyu Li4453.43
Yinlong Xie5231.67
Yanxiang Chen6516.23
Yujian Shi7453.43
Hao Zhang8564.88
Yingrui Li955472.28
Tak-Wah Lam101860164.96
Ruibang Luo111139.92