Title
Table extraction for answer retrieval
Abstract
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content presents difficulties for traditional retrieval techniques. This paper describes techniques for extracting tables from text and retrieving answers from the extracted information. We compare machine learning (especially, Conditional Random Fields) and heuristic methods for table extraction. To retrieve answers, our approach creates a cell document, which contains the cell and its metadata (headers, titles) for each table cell, and the retrieval model ranks the cells of the extracted tables using a language-modeling approach. Performance is tested using government statistical Web sites and news articles, and errors are analyzed in order to improve the system.
Year
DOI
Venue
2006
10.1007/s10791-006-9005-5
Inf. Retr.
Keywords
Field
DocType
Table extraction,Conditional random fields,Question answering,Information extraction
Conditional random field,Data mining,Metadata,Heuristic,Question answering,Information retrieval,Computer science,System evaluation,Information extraction,Disk formatting,Table (information)
Journal
Volume
Issue
ISSN
9
5
1386-4564
Citations 
PageRank 
References 
14
0.89
12
Authors
3
Name
Order
Citations
PageRank
Xing Wei1114160.87
W. Bruce Croft2178122796.94
Andrew Kachites McCallumzy3192031588.22