Harvesting facts from textual web sources by constrained label propagation - Citegraph

Paper Info

Title
Harvesting facts from textual web sources by constrained label propagation

Abstract
There have been major advances on automatically constructing large knowledge bases by extracting relational facts from Web and text sources. However, the world is dynamic: periodic events like sports competitions need to be interpreted with their respective timepoints, and facts such as coaching a sports team, holding political or business positions, and even marriages do not hold forever and should be augmented by their respective timespans. This paper addresses the problem of automatically harvesting temporal facts with such extended time-awareness. We employ pattern-based gathering techniques for fact candidates and construct a weighted pattern-candidate graph. Our key contribution is a system called PRAVDA based on a new kind of label propagation algorithm with a judiciously designed loss function, which iteratively processes the graph to label good temporal facts for a given set of target relations. Our experiments with online news and Wikipedia articles demonstrate the accuracy of this method.

Year	DOI	Venue
2011	10.1145/2063576.2063698	CIKM
Keywords	Field	DocType
sports team,harvesting fact,label propagation algorithm,business position,wikipedia article,good temporal fact,temporal fact,respective timespans,weighted pattern-candidate graph,sports competition,respective timepoints,textual web source,knowledge base,loss function	Data mining,Graph,Information retrieval,Label propagation,Computer science,Coaching,Artificial intelligence,Natural language processing,Periodic graph (geometry)	Conference
Citations	PageRank	References
35	1.44	25
Authors
5

Authors (5 rows)

Cited by (35 rows)

References (25 rows)

Name	Order	Citations	PageRank
Yafang Wang	1	134	13.56
Bin Yang	2	706	34.93
Lizhen Qu	3	197	12.80
Marc Spaniol	4	897	61.13
Gerhard Weikum	5	12710	2146.01

1