Abstract | ||
---|---|---|
This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that an infrequent pattern which can be transformed to a frequent one is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision. |
Year | Venue | Keywords |
---|---|---|
2016 | LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | error correction,synchronous tree substitution grammar,FREQT |
Field | DocType | Citations |
Computer science,Speech recognition,Natural language processing,Artificial intelligence,Treebank,Tree mining | Conference | 0 |
PageRank | References | Authors |
0.34 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kanta Suzuki | 1 | 0 | 0.68 |
Yoshihide Kato | 2 | 22 | 8.15 |
Shigeki Matsubara | 3 | 179 | 43.41 |