SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. - Citegraph

Paper Info

Title
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

Abstract
Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

Year	DOI	Venue
2014	10.1093/bioinformatics/btt601	BIOINFORMATICS
Keywords	Field	DocType
software design	Data mining,Data processing,Data set,Software design,Computer science,MIT License,Bioinformatics,Java,Database,Scalability,Scripting language	Journal
Volume	Issue	ISSN
30	1	1367-4803
Citations	PageRank	References
33	1.45	7
Authors
7

Authors (7 rows)

Cited by (33 rows)

References (7 rows)

Name	Order	Citations	PageRank
André Schumacher	1	71	7.26
Luca Pireddu	2	100	10.01
Matti Niemenmaa	3	65	3.91
Aleksi Kallio	4	85	5.75
Eija Korpelainen	5	103	8.95
gianluigi zanetti	6	208	29.13
Keijo Heljanko	7	751	47.90

1