Title
Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms
Abstract
Domain-specific knowledge bases play an increasingly important role in a variety of real applications. In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. We argue that the domain-specific knowledge bases tend to be incomplete, and are oblivious to their incompleteness, without a continuous completion procedure in place. The key component of this completion procedure is the classification of emerging query terms into corresponding properties of categories in existing taxonomy. Our proposal is that we use query logs to complete the product knowledge base of Taobao. However, the query driven completion usually faces many challenges including distinguishing the fine-grained semantic of unrecognized terms, handling the sparse data and so on. We propose a graph based solution to overcome these challenges. We first construct a lot of positive evidence to establish the semantical similarity between terms, and then run a shortest path or alternatively a random walk on the similarity graph under a set of constraints derived from a set of negative evidence to find the best candidate property for emerging query terms. We finally conduct extensive experiments on real data of Taobao and a subset of CN-DBpedia. The results show that our solution classifies emerging query terms with a good performance. Our solution is already deployed in Taobao, helping it find nearly 7 million new values for properties. The complete product knowledge base significantly improves the ratio of recognized queries and recognized terms by more than 25% and 32%, respectively.
Year
DOI
Venue
2019
10.1109/ICDE.2019.00129
2019 IEEE 35th International Conference on Data Engineering (ICDE)
Keywords
Field
DocType
Knowledge based systems,Semantics,User experience,Shape,Dogs,Painting
Graph,Data mining,User experience design,Shortest path problem,Information retrieval,Random walk,Computer science,Knowledge-based systems,Knowledge base,Sparse matrix,Semantics
Conference
ISSN
ISBN
Citations 
1084-4627
978-1-5386-7474-1
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Sihang Jiang101.69
Jiaqing Liang2379.59
Yanghua Xiao348254.90
Hai-Hong Tang4174.76
Haikuan Huang501.35
Jun Tan6151.61