Title
QCluster: Extending Alignment-Free Measures with Quality Values for Reads Clustering.
Abstract
The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15%). Thus it will be fundamental to exploit quality value information within the alignment-free framework. In this paper we present a family of alignment-free measures, called D-q-type, that incorporate quality value information and k-mers counts for the comparison of reads data. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. These measures are implemented in a software called QCluster (http://www.dei.unipd.it/similar to ciompin/main/qcluster.html).
Year
DOI
Venue
2014
10.1007/978-3-662-44753-6_1
ALGORITHMS IN BIOINFORMATICS
Keywords
Field
DocType
alignment-free measures,reads quality values,clustering reads
Data mining,Combinatorics,Pace,Data processing,Computer science,Error detection and correction,Cluster analysis,Data complexity
Conference
Volume
ISSN
Citations 
8701
0302-9743
2
PageRank 
References 
Authors
0.36
17
3
Name
Order
Citations
PageRank
Matteo Comin119120.94
Andrea Leoni220.36
Michele Schimd3161.31