Title | ||
---|---|---|
Description of the NCU Chinese Word Segmentation and Named Entity Recognition System for SIGHAN Bakeoff 2006 |
Abstract | ||
---|---|---|
Asian languages are far from most west- ern-style in their non-separate word se- quence especially Chinese. The preliminary step of Asian-like language processing is to find the word boundaries between words. In this paper, we present a general purpose model for both Chinese word segmentation and named entity rec- ognition. This model was built on the word sequence classification with prob- ability model, i.e., conditional random fields (CRF). We used a simple feature set for CRF which achieves satisfactory clas- sification result on the two tasks. Our model achieved 91.00 in F rate in UPUC- Treebank data, and 78.71 for NER task. |
Year | Venue | Field |
---|---|---|
2006 | SIGHAN@COLING/ACL | Conditional random field,Tokenization (data security),Word lists by frequency,Phrase chunking,Computer science,Word error rate,Text segmentation,Speech recognition,Artificial intelligence,Natural language processing,Classifier (linguistics),Named-entity recognition |
DocType | Citations | PageRank |
Conference | 9 | 0.52 |
References | Authors | |
7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yu-Chieh Wu | 1 | 247 | 23.16 |
Jie-Chi Yang | 2 | 350 | 43.91 |
Qian-Xiang Lin | 3 | 10 | 0.90 |