Title
Denoising of Quality Scores for Boosted Inference and Reduced Storage
Abstract
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu iochoa/DCCdenoiser_CodeAndSupplement.zip.
Year
DOI
Venue
2016
10.1109/DCC.2016.92
2016 Data Compression Conference (DCC)
Keywords
Field
DocType
quality score denoising,storage reduction,sequencing data,raw data,genetic variants,denoising scheme,denoised data,denoiser results
Noise reduction,Data mining,Medical decision making,Lossy compression,Noise measurement,Inference,Computer science,Raw data,Distortion,Lossless compression
Conference
Volume
ISSN
ISBN
2016
1068-0314
978-1-5090-1854-3
Citations 
PageRank 
References 
0
0.34
9
Authors
5
Name
Order
Citations
PageRank
Idoia Ochoa17013.10
Mikel Hernaez26512.68
r l goldfeder381.89
Tsachy Weissman41192119.50
Euan A. Ashley5142.86