Title
Discovering emerging entities with ambiguous names
Abstract
Knowledge bases (KB's) contain data about a large number of people, organizations, and other entities. However, this knowledge can never be complete due to the dynamics of the ever-changing world: new companies are formed every day, new songs are composed every minute and become of interest for addition to a KB. To keep up with the real world's entities, the KB maintenance process needs to continuously discover newly emerging entities in news and other Web streams. In this paper we focus on the most difficult case where the names of new entities are ambiguous. This raises the technical problem to decide whether an observed name refers to a known entity or represents a new entity. This paper presents a method to solve this problem with high accuracy. It is based on a new model of measuring the confidence of mapping an ambiguous mention to an existing entity, and a new model of representing a new entity with the same ambiguous name as a set of weighted keyphrases. The method can handle both Wikipedia-derived entities that typically constitute the bulk of large KB's as well as entities that exist only in other Web sources such as online communities about music or movies. Experiments show that our entity discovery method outperforms previous methods for coping with out-of-KB entities (called unlinkable in entity linking).
Year
DOI
Venue
2014
10.1145/2566486.2568003
WWW
Keywords
Field
DocType
kb maintenance process,existing entity,new company,ambiguous name,entity discovery method,new model,wikipedia-derived entity,new song,known entity,new entity,out-of-kb entity
Entity linking,Data mining,World Wide Web,Information retrieval,Computer science,Weak entity,SGML entity
Conference
Citations 
PageRank 
References 
47
1.41
23
Authors
3
Name
Order
Citations
PageRank
Johannes Hoffart1136252.62
yasemin altun22463150.46
Gerhard Weikum3127102146.01