Title
Hybrid in-database inference for declarative information extraction
Abstract
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in the IE process in a scalable manner. Recent work has proposed a probabilistic database (PDB) based declarative IE system that supports a leading statistical IE model, and an associated inference algorithm to answer top-k-style queries over the probabilistic IE outcome. Still, the broader problem of effectively supporting general probabilistic inference inside a PDB-based declarative IE system remains open. In this paper, we explore the in-database implementations of a wide variety of inference algorithms suited to IE, including two Markov chain Monte Carlo algorithms, the Viterbi and the sum-product algorithms. We describe the rules for choosing appropriate inference algorithms based on the model, the query and the text, considering the trade-off between accuracy and runtime. Based on these rules, we describe a hybrid approach to optimize the execution of a single probabilistic IE query to employ different inference algorithms appropriate for different records. We show that our techniques can achieve up to 10-fold speedups compared to the non-hybrid solutions proposed in the literature.
Year
DOI
Venue
2011
10.1145/1989323.1989378
SIGMOD Conference
Keywords
Field
DocType
single probabilistic ie query,pdb-based declarative ie system,leading statistical ie model,hybrid in-database inference,different inference,ie process,declarative information extraction,appropriate inference,declarative ie system,probabilistic ie outcome,associated inference algorithm,ie task,conditional random field,viterbi,probabilistic database,probabilistic graphical models,conditional random fields,information extraction,query optimization,markov chain monte carlo
Data mining,Computer science,Artificial intelligence,Probabilistic logic,Query optimization,Inference,Probabilistic analysis of algorithms,Information extraction,Graphical model,Probabilistic relevance model,Database,Machine learning,Probabilistic database
Conference
Citations 
PageRank 
References 
21
1.08
21
Authors
5
Name
Order
Citations
PageRank
Daisy Zhe Wang175550.24
Michael J. Franklin2174231681.10
Minos Garofalakis34904664.22
Joseph M. Hellerstein4140931651.14
Michael L. Wick520412.89