Massive genomic data processing and deep analysis - Citegraph

Paper Info

Title
Massive genomic data processing and deep analysis

Abstract
Today large sequencing centers are producing genomic data at the rate of 10 terabytes a day and require complicated processing to transform massive amounts of noisy raw data into biological information. To address these needs, we develop a system for end-to-end processing of genomic data, including alignment of short read sequences, variation discovery, and deep analysis. We also employ a range of quality control mechanisms to improve data quality and parallel processing techniques for performance. In the demo, we will use real genomic data to show details of data transformation through the workflow, the usefulness of end results (ready for use as testable hypotheses), the effects of our quality control mechanisms and improved algorithms, and finally performance improvement.

Year	DOI	Venue
2012	10.14778/2367502.2367534	PVLDB
Keywords	Field	DocType
performance improvement,genomic data,massive genomic data processing,real genomic data,parallel processing technique,deep analysis,end-to-end processing,data transformation,data quality,noisy raw data,biological information,quality control mechanism	Data mining,Data processing,Data quality,Terabyte,Computer science,Parallel processing,Raw data,Workflow,Database,Performance improvement	Journal
Volume	Issue	ISSN
5	12	2150-8097
Citations	PageRank	References
3	0.45	7
Authors
5

Authors (5 rows)

Cited by (3 rows)

References (7 rows)

Name	Order	Citations	PageRank
Abhishek Roy	1	451	32.21
Yanlei Diao	2	2234	108.95
Evan Mauceli	3	14	1.48
Yiping Shen	4	51	6.29
Bai-Lin Wu	5	3	0.45

1