TCMEF: A TCM Entity Filter Using Less Text. - Citegraph

Paper Info

Title
TCMEF: A TCM Entity Filter Using Less Text.

Abstract
We often need to cut out a subset of required entities from existing knowledge graphs or websites, when building a knowledge graph for a certain field. In the area of Traditional Chinese Medicine (TCM), we face the task of screening relevant entities from knowledge bases and websites. In this paper, a three-phase TCM entity filter (TCMEF) is proposed, which can identify TCM related entities with high accuracy only using the texts of very short entity titles instead of analyzing the long document texts. The main part of our method is a Short Text LSTM Classifier (STLC), which learns the text style of TCM terms using stroke and character joint features without word segmentation. In addition, an entity representing a person name, which is severe to be classified by STLC, will be picked out by a Person Name Filter (PNF) and further analyzed by a Rich Text Filter (RTF). The filter uses BaiduBaike and HudongBaike (the two largest Chinese encyclopedia websites) as the main data sources. TCMEF gets an F1 score of 0.9275 in classification, which outperforms general word based short text classification algorithms and is close to a Latent Dirichlet Allocation based model (LDA-SVM) using rich texts.

Year	Venue	Field
2018	KSEM	F1 score,Latent Dirichlet allocation,Knowledge graph,Computer science,Text segmentation,Natural language processing,Artificial intelligence,Encyclopedia,Statistical classification,Classifier (linguistics),Rich Text Format,Machine learning
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
4	4

Authors (4 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Hualong Zhang	1	0	0.68
Shuzhi Cheng	2	0	0.34
Liting Liu	3	0	0.68
Wenxuan Shi	4	7	3.67

1