Title
End-to-End Conversion of HTML Tables for Populating a Relational Database
Abstract
Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).
Year
DOI
Venue
2014
10.1109/DAS.2014.9
Document Analysis Systems
Keywords
Field
DocType
header cross-product,wang category,header factoring,table segmentation,canonical relational table,table index,layout,indexing,relational databases,world wide web,relational database,internet,text analysis,html
Row,Decision table,Information retrieval,Relational database,Computer science,Segmentation,Search engine indexing,Foreign key,Table (information),Database,Scalability
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
George Nagy1913105.94
Sharad C. Seth267193.61
David W. Embley31915480.08