Abstract | ||
---|---|---|
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in the IE process in a scalable manner. Recent work has proposed a probabilistic database (PDB) based declarative IE system that supports a leading statistical IE model, and an associated inference algorithm to answer top-k-style queries over the probabilistic IE outcome. Still, the broader problem of effectively supporting general probabilistic inference inside a PDB-based declarative IE system remains open. In this paper, we explore the in-database implementations of a wide variety of inference algorithms suited to IE, including two Markov chain Monte Carlo algorithms, the Viterbi and the sum-product algorithms. We describe the rules for choosing appropriate inference algorithms based on the model, the query and the text, considering the trade-off between accuracy and runtime. Based on these rules, we describe a hybrid approach to optimize the execution of a single probabilistic IE query to employ different inference algorithms appropriate for different records. We show that our techniques can achieve up to 10-fold speedups compared to the non-hybrid solutions proposed in the literature. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1989323.1989378 | SIGMOD Conference |
Keywords | Field | DocType |
single probabilistic ie query,pdb-based declarative ie system,leading statistical ie model,hybrid in-database inference,different inference,ie process,declarative information extraction,appropriate inference,declarative ie system,probabilistic ie outcome,associated inference algorithm,ie task,conditional random field,viterbi,probabilistic database,probabilistic graphical models,conditional random fields,information extraction,query optimization,markov chain monte carlo | Data mining,Computer science,Artificial intelligence,Probabilistic logic,Query optimization,Inference,Probabilistic analysis of algorithms,Information extraction,Graphical model,Probabilistic relevance model,Database,Machine learning,Probabilistic database | Conference |
Citations | PageRank | References |
21 | 1.08 | 21 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daisy Zhe Wang | 1 | 755 | 50.24 |
Michael J. Franklin | 2 | 17423 | 1681.10 |
Minos Garofalakis | 3 | 4904 | 664.22 |
Joseph M. Hellerstein | 4 | 14093 | 1651.14 |
Michael L. Wick | 5 | 204 | 12.89 |