Title | ||
---|---|---|
Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data. |
Abstract | ||
---|---|---|
We present a memory-based named-entity recognition system that chunks and labels named entities in a oneshot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German. |
Year | DOI | Venue |
---|---|---|
2003 | 10.3115/1119176.1119203 | CoNLL |
Keywords | Field | DocType |
final test,language-specific seed,training material,oneshot task,memory-based one-step named-entity recognition,classified unannotated data,conll-2003 shared task data,seed list feature,first-stage output error,overall f-rate,memory-based named-entity recognition system | Recognition system,Computer science,Speech recognition,Artificial intelligence,Natural language processing,Classifier (linguistics),Named-entity recognition,Machine learning,Stacking,German,Test set | Conference |
Citations | PageRank | References |
5 | 1.97 | 8 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Iris Hendrickx | 1 | 285 | 30.91 |
Antal Van Den Bosch | 2 | 1038 | 132.37 |