Title
TCMEF: A TCM Entity Filter Using Less Text.
Abstract
We often need to cut out a subset of required entities from existing knowledge graphs or websites, when building a knowledge graph for a certain field. In the area of Traditional Chinese Medicine (TCM), we face the task of screening relevant entities from knowledge bases and websites. In this paper, a three-phase TCM entity filter (TCMEF) is proposed, which can identify TCM related entities with high accuracy only using the texts of very short entity titles instead of analyzing the long document texts. The main part of our method is a Short Text LSTM Classifier (STLC), which learns the text style of TCM terms using stroke and character joint features without word segmentation. In addition, an entity representing a person name, which is severe to be classified by STLC, will be picked out by a Person Name Filter (PNF) and further analyzed by a Rich Text Filter (RTF). The filter uses BaiduBaike and HudongBaike (the two largest Chinese encyclopedia websites) as the main data sources. TCMEF gets an F1 score of 0.9275 in classification, which outperforms general word based short text classification algorithms and is close to a Latent Dirichlet Allocation based model (LDA-SVM) using rich texts.
Year
Venue
Field
2018
KSEM
F1 score,Latent Dirichlet allocation,Knowledge graph,Computer science,Text segmentation,Natural language processing,Artificial intelligence,Encyclopedia,Statistical classification,Classifier (linguistics),Rich Text Format,Machine learning
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
4
4
Name
Order
Citations
PageRank
Hualong Zhang100.68
Shuzhi Cheng200.34
Liting Liu300.68
Wenxuan Shi473.67