Title
Knowledge base completion via search-based question answering
Abstract
Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query `who is the mother of Frank Zappa'. However, this is likely to return `The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.
Year
DOI
Venue
2014
10.1145/2566486.2568032
WWW
Keywords
Field
DocType
search-based question answering,world knowledge,web-search-based question-answering technology,knowledge base completion,aggregate candidate answer,available knowledge base,knowledge base,entity attribute,search engine,known ethnicity,frank zappa,search result,freebase,information extraction
Data mining,Computer science,Artificial intelligence,Knowledge base,Probabilistic logic,World Wide Web,Ask price,Place of birth,Leverage (finance),Question answering,Information extraction,Machine learning,False positive paradox
Conference
Citations 
PageRank 
References 
73
1.85
15
Authors
6
Name
Order
Citations
PageRank
West, Robert151841.98
Evgeniy Gabrilovich24573224.48
Michael Kuperberg37589529.66
Shaohua Sun462216.73
Rahul Gupta5792.28
Dekang Lin65036388.45