Here We Go Again: Modern GEC Models Need Help with Spelling
The study focuses on how modern GEC systems handle character-level errors. We discuss the ways these errors effect the performance of models and test how models of different architectures handle them. We conclude that specialized GEC systems do struggle against correcting non-existent words, and that a simple spellchecker considerably improve overall performance of a model. To evaluate it, we assess the models over several datasets. In addition to CoNLL-2014 validation dataset, we contribute a synthetic dataset with higher density of character-level errors and conclude that, provided that models generally show very high scores, validation datasets with higher density of tricky errors are a useful tool to compare models. Lastly, we notice cases of incorrect treatment of non-existent words on experts' annotation and contribute a cleared version of this dataset.
In contrast to specialized GEC systems, GPT 3.5 model used for GEC task handles character-level errors well. We suggest that this better performance is explained by the fact that GPT 3.5 is not extensively trained on annotated texts with errors, but gets as input grammatically and orthographically correct texts.