Title
Grammatical Error Correction: More Data with More Context
Abstract
Grammatical Error Correction (GEC) seriously suffers from a scarcity of data, both annotated and unannotated, as humans do not intentionally make grammatical errors. To account for this, we make use of the plentiful unlabeled plain text available and augment a dataset with artificial noise to increase our effective training data and pre-train our model as a denoising autoencoder (DAE), which offers an intuitive data augmentation solution for GEC. In a novel approach, we enhance our DAE, a Transformer Model, with a cross-document context mechanism and use a parallel encoder to encode the cross-document context before fusing the two contexts of the encoders in the decoder. Supplied by the combination of document similarity metrics and any unlabeled plain text, this serves as a new method of equipping a GEC model with supplemental context and allowing it to glean grammatical information from a separate plain text corpus. We evaluate our model on the CoNLL-2014 GEC Shared Task and achieve results that approach state-of-the-art for single models and show great potential with ever available and plentiful plain text.
Year
DOI
Venue
2020
10.1109/IALP51396.2020.9310498
2020 International Conference on Asian Language Processing (IALP)
Keywords
DocType
ISSN
grammatical error correction,transformer,data augmentation
Conference
2159-1962
ISBN
Citations 
PageRank 
978-1-7281-7690-1
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Kevin Parnow102.03
Zuchao Li200.34
Hai Zhao3960113.64