Abstract | ||
---|---|---|
A patent is a property right for an invention granted by the government to the inventor. Patents often have a high concentration of scientific and technical terms that are rare in everyday language. However, some scientific and technical terms usually appear with high frequency only in one specific patent. In this paper, we propose a pragmatic approach to Chinese word segmentation on patents where we train a sequence labeling model based on a group of novel document-level features. Experiments show that the accuracy of our model reached 96.3% (F-1 score) on the development set and 95.0% on a held-out test set. |
Year | Venue | Field |
---|---|---|
2014 | PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2 | F1 score,Property rights,Sequence labeling,Computer science,Text segmentation,Speech recognition,Natural language processing,Artificial intelligence,Invention,Government,Test set |
DocType | Volume | Citations |
Conference | P14-2 | 2 |
PageRank | References | Authors |
0.40 | 17 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Si Li | 1 | 14 | 7.29 |
Nianwen Xue | 2 | 1654 | 117.65 |