Hybrid in-database inference for declarative information extraction - Citegraph

Paper Info

Title
Hybrid in-database inference for declarative information extraction

Abstract
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in the IE process in a scalable manner. Recent work has proposed a probabilistic database (PDB) based declarative IE system that supports a leading statistical IE model, and an associated inference algorithm to answer top-k-style queries over the probabilistic IE outcome. Still, the broader problem of effectively supporting general probabilistic inference inside a PDB-based declarative IE system remains open. In this paper, we explore the in-database implementations of a wide variety of inference algorithms suited to IE, including two Markov chain Monte Carlo algorithms, the Viterbi and the sum-product algorithms. We describe the rules for choosing appropriate inference algorithms based on the model, the query and the text, considering the trade-off between accuracy and runtime. Based on these rules, we describe a hybrid approach to optimize the execution of a single probabilistic IE query to employ different inference algorithms appropriate for different records. We show that our techniques can achieve up to 10-fold speedups compared to the non-hybrid solutions proposed in the literature.

Year	DOI	Venue
2011	10.1145/1989323.1989378	SIGMOD Conference
Keywords	Field	DocType
single probabilistic ie query,pdb-based declarative ie system,leading statistical ie model,hybrid in-database inference,different inference,ie process,declarative information extraction,appropriate inference,declarative ie system,probabilistic ie outcome,associated inference algorithm,ie task,conditional random field,viterbi,probabilistic database,probabilistic graphical models,conditional random fields,information extraction,query optimization,markov chain monte carlo	Data mining,Computer science,Artificial intelligence,Probabilistic logic,Query optimization,Inference,Probabilistic analysis of algorithms,Information extraction,Graphical model,Probabilistic relevance model,Database,Machine learning,Probabilistic database	Conference
Citations	PageRank	References
21	1.08	21
Authors
5

Authors (5 rows)

Cited by (21 rows)

References (21 rows)

Name	Order	Citations	PageRank
Daisy Zhe Wang	1	755	50.24
Michael J. Franklin	2	17423	1681.10
Minos Garofalakis	3	4904	664.22
Joseph M. Hellerstein	4	14093	1651.14
Michael L. Wick	5	204	12.89

1