Large-scale cross-document coreference using distributed inference and hierarchical models - Citegraph

Paper Info

Title
Large-scale cross-document coreference using distributed inference and hierarchical models

Abstract
Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.

Year	Venue	Keywords
2011	ACL	cross-document coreference,large scale processing,effective approximate inference,automated knowledge base construction,wikipedia entity,large collection,large-scale cross-document coreference,hierarchical model,web page,large dataset,inference technique
Field	DocType	Volume
Coreference,Web page,Inference,Computer science,Approximate inference,Information extraction,Natural language processing,Artificial intelligence,Knowledge base,Hierarchical database model,Machine learning,Scalability	Conference	P11-1
Citations	PageRank	References
65	2.36	32
Authors
4

Authors (4 rows)

Cited by (65 rows)

References (32 rows)

Name	Order	Citations	PageRank
Sameer Singh	1	1060	71.63
Amarnag Subramanya	2	422	24.53
Fernando Pereira	3	17717	2124.79
Andrew Kachites McCallumzy	4	19203	1588.22

1