Title
A Context-Enhanced Transformer with Abbr-Recover Policy for Chinese Abbreviation Prediction
Abstract
ABSTRACTChinese abbreviation prediction is very important for various natural language processing tasks such as query understanding and entity linking, since people tend to use the concise abbreviation rather than the full form (name) to mention an entity. The existing models achieve their predictions through sequence labeling, i.e., the binary classification for each character (token) of the full form. However, they only leverage the semantics of the entity itself, overlooking the label dependencies between the tokens, and the rich information of the entity-related texts. In this paper we proposed a Context-Enhanced Transformer with Abbr-Recover policy, namely CETAR, for Chinese abbreviation prediction. CETAR predicts the abbreviation sequence mainly through an iterative decoding process, of which each round consists of an abbreviation and recovery operation. Our extensive experiments upon both general field and specific domain datasets justify that CETAR outperforms the state-of-the-art baselines including sequence labeling models and sequence generation models. Moreover, we have successfully constructed a Chinese abbreviation dataset from the famous tour website Fliggy, and we also shared it at https://github.com/tolerancecky/abbr-0731. The online A/B test on the Fliggy search system shows that 2.03% of conversion rate improvement has been achieved with the predicted abbreviations.
Year
DOI
Venue
2022
10.1145/3511808.3557074
Conference on Information and Knowledge Management
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Kaiyan Cao100.34
Deqing Yang200.34
Jingping Liu300.68
Jiaqing Liang400.68
Yanghua Xiao548254.90
Feng Wei600.34
Baohua Wu700.34
Quan Lu821.78