Abstract | ||
---|---|---|
We describe an approach to Grammatical Error Correction (GEC) that is effective at making use of models trained on large amounts of weakly supervised bitext. We train the Transformer sequence-to-sequence model on 4B tokens of Wikipedia revisions and employ an iterative decoding strategy that is tailored to the loosely-supervised nature of the Wikipedia training corpus. Finetuning on the Lang-8 corpus and ensembling yields an F0.5 of 58.3 on the CoNLLu002714 benchmark and a GLEU of 62.4 on JFLEG. The combination of weakly supervised training and iterative decoding obtains an F0.5 of 48.2 on CoNLLu002714 even without using any labeled GEC data. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Computation and Language | Journal |
Volume | Citations | PageRank |
abs/1811.01710 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jared Lichtarge | 1 | 0 | 0.68 |
Christopher Alberti | 2 | 21 | 2.79 |
Shankar Kumar | 3 | 232 | 20.70 |
Noam Shazeer | 4 | 1089 | 43.70 |
Niki Parmar | 5 | 522 | 13.34 |