A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics - Citegraph

Paper Info

Title
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics

Abstract
Text analytics has become increasingly important with the rapid growth of text data. Particularly, information extraction (IE), which extracts structured data from text, has received significant attention. Unfortunately, IE is often computationally intensive. To address this issue, MapReduce has been used for large scale IE. Recently, there are emerging efforts from both academia and industry on pushing IE inside DBMSs. This leads to an interesting and important question: Given that both MapReduce and parallel DBMSs are for large scale analytics, which platform is a better choice for large scale IE? In this paper, we propose a benchmark to systematically study the performance of both platforms for large scale IE tasks. The benchmark includes both statistical learning based and rule based IE programs, which have been extensively used in real-world IE tasks. We show how to express these programs on both platforms and conduct experiments on real-world datasets. Our results show that parallel DBMSs is a viable alternative for large scale IE.

Year	DOI	Venue
2013	10.1145/2452376.2452448	EDBT
Keywords	Field	DocType
large scale analytics,large scale,text analytics,ie program,parallel dbmss,important question,real-world ie task,performance comparison,large-scale text analytics,text data,large scale ie task,real-world datasets,sla,negotiation,pricing,accuracy,materialized views	Data mining,Rule-based system,Text mining,Computer science,Data sharing,Information extraction,Statistical learning,Analytics,Materialized view,Data model,Database	Conference
Citations	PageRank	References
2	0.42	29
Authors
2

Authors (2 rows)

Cited by (2 rows)

References (29 rows)

Name	Order	Citations	PageRank
fei chen	1	18	8.59
Meichun Hsu	2	3437	778.34

1