Title | ||
---|---|---|
Using geometric structures to improve the error correction algorithm of high-throughput sequencing data on MapReduce framework |
Abstract | ||
---|---|---|
Next-generation sequencing (NGS) data are a rapidly growing example of big data and a source of new knowledge in science. However, sequencing errors remain unavoidable and reduce the quality of NGS data. Error correction, therefore, is a critical step in the successful utilization of NGS data, including de novo genome assembly and DNA resequencing. Since NGS throughput doubles approximately every five months and the length of NGS records (i.e., reads) is increasing, improvements in efficiency and effectiveness of computational strategies are needed. In this study, we aim to improve the performance of CloudRS, an open-source MapReduce application designed to correct sequencing errors in NGS data. We introduce the readmessage (RM) diagram to represent the set of messages, i.e., the key-value pairs generated on each read. We also present the Gradient-number Votes (GNV) scheme in order to trim off portions of the RM diagram, thereby reducing the total size of messages associated with each read. Experimental results show that the GNV scheme successfully reduce execution time and improve the quality of the de novo genome assembly. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/BigData.2014.7004306 | BigData Conference |
Keywords | Field | DocType |
mapreduce framework,geometric structures,ngs data,diagrams,next-generation sequencing,big data,next-generation sequencing data,error correction algorithm,mapreduce,readmessage diagram,genetics,cloudrs,geometric structure,gradient-number votes,gnv,rm diagram,bioinformatics,error correction,next generation sequencing | Data mining,Trim,Computer science,Error detection and correction,Execution time,DNA sequencing,Throughput,Big data,DNA Resequencing,Sequence assembly | Conference |
ISSN | Citations | PageRank |
2639-1589 | 0 | 0.34 |
References | Authors | |
18 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wei-Chun Chung | 1 | 6 | 2.79 |
Yu-Jung Chang | 2 | 119 | 12.09 |
D.T. Lee | 3 | 627 | 78.14 |
Jan-Ming Ho | 4 | 950 | 106.64 |