Abstract | ||
---|---|---|
This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1587/transinf.2016EDP7357 | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS |
Keywords | Field | DocType |
error correction, synchronous tree substitution grammar, FREQT | Annotation,Pattern recognition,Computer science,Error detection and correction,Natural language processing,Artificial intelligence,Syntax,Machine learning,Tree mining | Journal |
Volume | Issue | ISSN |
E100D | 5 | 1745-1361 |
Citations | PageRank | References |
0 | 0.34 | 7 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kanta Suzuki | 1 | 0 | 0.68 |
Yoshihide Kato | 2 | 22 | 8.15 |
Shigeki Matsubara | 3 | 179 | 43.41 |