Title
Effective Document-Level Features For Chinese Patent Word Segmentation
Abstract
A patent is a property right for an invention granted by the government to the inventor. Patents often have a high concentration of scientific and technical terms that are rare in everyday language. However, some scientific and technical terms usually appear with high frequency only in one specific patent. In this paper, we propose a pragmatic approach to Chinese word segmentation on patents where we train a sequence labeling model based on a group of novel document-level features. Experiments show that the accuracy of our model reached 96.3% (F-1 score) on the development set and 95.0% on a held-out test set.
Year
Venue
Field
2014
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2
F1 score,Property rights,Sequence labeling,Computer science,Text segmentation,Speech recognition,Natural language processing,Artificial intelligence,Invention,Government,Test set
DocType
Volume
Citations 
Conference
P14-2
2
PageRank 
References 
Authors
0.40
17
2
Name
Order
Citations
PageRank
Si Li1147.29
Nianwen Xue21654117.65