Title
Extracting Knowledge from Web Tables Based on DOM Tree Similarity.
Abstract
Structured (semi-structured) knowledge extraction from Web tables is an important way to obtain high quality knowledge. Unlike most extraction methods which need to understand the tables with external knowledge bases, our method uses the inherent similarities of tables to determine the semantic structure of tables. With a comprehensive analysis of table structures of various forms, we provide a novel way for calculating the DOM tree similarity between various web tables based on DTW and for clustering tables. By using 5000 Wikipedia tables which were extracted at random as the corpus, experiments show that the result of table clustering is close to the result of classification based on empirical approaches, and without the use of external knowledge bases, the quality of knowledge extracted from the tables is satisfactory.
Year
DOI
Venue
2016
10.1007/978-3-319-47650-6_24
Lecture Notes in Artificial Intelligence
Keywords
Field
DocType
Knowledge extraction,Web tables,DOM tree similarity,Table clustering
Data mining,Information retrieval,Computer science,Knowledge extraction,Document Object Model,Web tables,Cluster analysis
Conference
Volume
ISSN
Citations 
9983
0302-9743
1
PageRank 
References 
Authors
0.39
18
5
Name
Order
Citations
PageRank
Xiaolong Wu112818.86
Cungen Cao230958.63
Ya Wang395.25
Jianhui Fu420.75
Shi Wang52812.46