Abstract | ||
---|---|---|
Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1093/bioinformatics/btt601 | BIOINFORMATICS |
Keywords | Field | DocType |
software design | Data mining,Data processing,Data set,Software design,Computer science,MIT License,Bioinformatics,Java,Database,Scalability,Scripting language | Journal |
Volume | Issue | ISSN |
30 | 1 | 1367-4803 |
Citations | PageRank | References |
33 | 1.45 | 7 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
André Schumacher | 1 | 71 | 7.26 |
Luca Pireddu | 2 | 100 | 10.01 |
Matti Niemenmaa | 3 | 65 | 3.91 |
Aleksi Kallio | 4 | 85 | 5.75 |
Eija Korpelainen | 5 | 103 | 8.95 |
gianluigi zanetti | 6 | 208 | 29.13 |
Keijo Heljanko | 7 | 751 | 47.90 |