Here We Go Again: Modern GEC Models Need Help with Spelling

Starchenko V.; A. Starchenko

doi:10.15514/ISPRAS-2022-35(5)-14

Publications

?

Here We Go Again: Modern GEC Models Need Help with Spelling

Proceedings of the Institute for System Programming of the RAS. 2023. Vol. 35. No. 5. P. 215–228.

Starchenko V., Starchenko A.

The study focuses on how modern GEC systems handle character-level errors. We discuss the ways these errors effect the performance of models and test how models of different architectures handle them. We conclude that specialized GEC systems do struggle against correcting non-existent words, and that a simple spellchecker considerably improve overall performance of a model. To evaluate it, we assess the models over several datasets. In addition to CoNLL-2014 validation dataset, we contribute a synthetic dataset with higher density of character-level errors and conclude that, provided that models generally show very high scores, validation datasets with higher density of tricky errors are a useful tool to compare models. Lastly, we notice cases of incorrect treatment of non-existent words on experts' annotation and contribute a cleared version of this dataset. In contrast to specialized GEC systems, LLaMA model used for GEC task handles character-level errors well. We suggest that this better performance is explained by the fact that Alpaca is not extensively trained on annotated texts with errors, but gets as input grammatically and orthographically correct texts.

Research target: Philology and Linguistics Computer Science

Language: English

DOI

Text on another site

Keywords: validation валидация предобработка preprocessing спеллчекер GEC spellcheck synthetic datasets исправление грамматических ошибок синтетические датасеты

Publication based on the results of:

Constituent structure and constituents' interpretation in the grammar architecture of the languages of Russian (2023)

Не только ἐπιχώρια διδάγματα: пайдейя Эпаминонда

Mozhaysky A., Schole. Философское антиковедение и классическая традиция 2026 Т. 20 № 2 С. 1105–1116

This article examines the education of Epaminondas, the most famous Theban military and political figure. However, in antiquity, Epaminondas was also renowned for his education and philosophical authority. The study demonstrates that Epaminondas' education encompassed a complex set of local teachings, which Pausanias describes as ἐπιχώρια διδάγματα. However, Epaminondas' education differed from that of most members ...

Added: July 17, 2026

Английский язык для студентов педагогических вузов. = English for Pre-Service Teachers (B2-C1)

Stognieva O., Новикова В. П., М.: Флинта, 2026.

Инновационный курс английского языка для специальных целей для студентов педагогических вузов предлагает погружение в актуальный образовательный дискурс: от вопросов воспитания и когнитивного развития детей и подростков до переосмысления роли школы в цифровую эпоху. Содержательной основой курса выступают аутентичные мультимодальные материалы, позволяющие анализировать глобальные тренды современных образовательных систем и подходов. Издание идеально подходит вузам, стремящимся подготовить ...

Added: July 16, 2026

Вклад Нгуен Тонг Куая в развитие вьетнамской поэзии (Новый взгляд на творчество поэта XVIII века)

Britov I., Вьетнамские исследования 2026 Т. 10 № 2 С. 87–98

The article analyzes the work of the poet of the XVIII century. Nguyen Tong Quai. Attention is drawn to the fact that in Vietnam, only after the proclamation of the policy of renewal, they began to actively study and appreciate his literary legacy, although even during the poet's lifetime, his contemporaries gave extremely positive reviews ...

Added: July 16, 2026

WSI-GT: Pseudo-Label Guided Graph Transformer for Whole-Slide Histology

Михайлов И. А., Machine Learning and Knowledge Extraction 2026 Vol. 8 No. 1 Article 8

Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely on deep graph models prone to oversmoothing and loss of local tissue ...

Added: July 16, 2026

On the construction of Barnes–Wall lattices and their application in cryptography

Kuninets A., Malygina E., Leevik A. G. et al., Journal of Computer Virology and Hacking Techniques 2026 No. 22 Article 62

In this work, we investigate the application of Barnes–Wall lattices in post-quantum cryptographic schemes. We survey and analyze several constructions of Barnes–Wall lattices, including subgroup chains, the generalized k-ing construction, and connections with Reed-Muller codes, highlighting their equivalence over both Z[i] and Z. Building on these structural insights, we introduce a new algorithm for efficient ...

Added: July 16, 2026

Tencent и Open Source. Как относится к открытому ПО самый дорогой бренд Китая?

Silakov D., Системный администратор 2026 № 5 С. 46–51

В предыдущей статье про Open Source в КНР [1] мы рассказали про Alibaba – крупную корпорацию, занимающую тридцатое место в рейтинге самых значимых мировых брэндов за 2025 год [2]. Место почетное, но не первое среди китайских компаний – на тринадцатом месте расположилась Tencent, разработчик WeChat и ряда других продуктов, широко используемых нашими восточными соседями. Tencent ...

Added: July 14, 2026

Комитативно-аддитивная полисемия в пуровском диалекте лесного ненецкого языка

Kozlov A., Лапшина К. М., Вопросы языкознания 2026 № 4 С. 132–146

This article examines two functions of the suffix -samae in the Pur dialect of Forest Nenets based on fieldwork data: comitative (expression of jointness: ‘with X’) and scalar additive (focus particle with the meaning ‘even X’). The comitative use of the suffix -samae primarily marks an inanimate companion. However, its use is also possible with other types ...

Added: July 13, 2026

2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

IEEE, 2026.

Added: July 13, 2026

Prompt Design for GPT-4 Assessments of EFL Student Reports

Stognieva O., Murashova N., Journal of Asia TEFL 2026 Vol. 23 No. 2 P. 490–505

This study investigates the impact of different prompt design strategies on the performance of GPT-4 in assessing undergraduate reports within an English as a Foreign Language (EFL) context. As Large Language Models (LLMs) increasingly integrate into educational assessment, understanding how prompt engineering affects grading accuracy and alignment with human judgment is crucial. Three prompt design methods—TELeR Taxonomy, Six strategies ...

Added: July 12, 2026

Mathematical Optimization Theory and Operations Research, 25th International Conference, MOTOR 2026 Irkutsk, Russia, July 6–11, 2026 Proceedings

Switzerland: Springer, 2026.

This volume contains the refereed proceedings of the 25th International Conference on Mathematical Optimization Theory and Operations Research (MOTOR 2026) 1 held during July 6–11 in a picturesque place near Lake Baikal, Irkutsk, Russia. The MOTOR conference is a direct successor and scientific inheritor of several prominent events on mathematical programming, combinatorial and stochastic optimization, ...

Added: July 12, 2026

Задачи бесконечной регулярной реализуемости

Шиманогов И. Н., Vyalyi M., Дискретный анализ и исследование операций 2025 Т. 32 № 4(166) С. 213–230

A well-studied class of algorithmic problems is that of regular realizability: checking the non-emptiness of the intersection of a regular language with a given language. This problem has a natural algebraic interpretation: verifying whether an element of a Boolean algebra belongs to the kernel of a certain homomorphism. This motivates the consideration of an analogous ...

Added: July 12, 2026

International Academic Conference. Proceedings of the Scientific Forum “Modern Science: Theory and Practice” (April 22, 2026). Belgrade, Serbia. Part 3.

Scientific publishing house Infinity, 2026.

Scientific Forum Proceedings combine materials of the conference – research papers and thesis reports of scientific workers. They examine technical, juridical and sociological aspects of research issues. Some articles deal with theoretical and methodological approaches and principles of research questions of personality professionalization. ...

Added: July 10, 2026

Этот смутный объект внимания: "реальные предметы" и гаптический опыт в рассказах В. Вулф

Shulyatieva D., Новое литературное обозрение 2026 № 199 С. 128–140

В статье рассмотрена гаптическая образность в поэтике В. Вулф на примере трех ее рассказов («Пятно на стене», «Женщина в зеркале», «Реальные предметы»), в центре которых оказываются предметы, устанавливающие обновленные отношения с героями. С опорой на теорию гаптической визуальности и на теорию вещи описаны трансформации, которые происходят с предметами, и переживание, которое открывается герою и нарратору при соприкосновении с ними, ...

Added: July 10, 2026

Two ga-morphemes in Rutul: Accidental similarity or a case of polygrammaticalization?

Maisak T., Word Structure 2026 Vol. 19 No. 2-3 P. 338–367

In a situation when two or more grammaticalization targets in one language are phonologically identical but functionally distinct, neither polygrammaticalization nor accidental syncretism can be ruled out, especially if we are dealing with a language without historical attestations. In the present paper, I present a detailed account of the coexistence of two homophonous grammatical markers ...

Added: July 9, 2026

Towards a typology of imperative interjections: ‘Take it!’ in the Caucasus

Maisak T., Transactions of the Philological Society 2026 Vol. 124 No. 2 P. 386–427

This paper presents a first typological study of a particular type of imperative interjections, namely interjections with the meaning ‘here, take it!’ used by a speaker when they ask the addressee to take something from the speaker's hands (often combined with a gesture of giving). The sample of languages is both geographically and genealogically restricted ...

Added: July 9, 2026

Light Verb Constructions from a Cross-Linguistic Perspective

Berlin, Boston: De Gruyter, 2025.

Light verb constructions are complex predicates consisting of a semantically reduced verb and an additional often phrasal element contributing the main predicational content. Although light verb constructions have been identified for various (genetically unrelated) languages, a comparative concept which allows identifying light verb constructions across languages is still missing. The present volume approaches this issue ...

Added: July 9, 2026

Improving Differential Equation Solving in Compact Language Models via Activation Steering and Reinforcement Learning

Surkov A., Ignatenko V., Koltcov Sergei, Computers, Materials and Continua 2026

Large language models have recently demonstrated promising capabilities in mathematical reasoning; however, their performance on tasks requiring strict symbolic manipulation, such as solving differential equations, remains limited, especially for compact models. In this work, we investigate whether activation steering combined with reinforcement learning can improve the quality of solutions generated by pretrained language models without ...

Added: July 8, 2026

Computational Science and Its Applications – ICCSA 2026 Workshops

Springer, 2027.

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education. LNCS enjoys close cooperation with the computer science R & ...

Added: July 8, 2026

The Semiotic Intensity Approach: A Scoping Review of Amplification and Attenuation Mechanisms in Multimodal Media Discourse

Yin Z., Terra Linguistica 2026 Vol. 17 No. 2 P. 152–168

Abstract. In the context of global communication, the construction of national images in the media has evolved from passive reporting to active meaning modulation. Using China as a case study, this research introduces the Semiotic Intensity Approach (SIA) to quantify how news media integrate verbal, visual, and layout resources to either amplify or attenuate specific ...

Added: July 8, 2026

Conference Proceedings: 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 14-15 May 2026

IEEE, 2026.

The purpose of the 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) is to bring together researchers and practitioners from multiple areas of radio science, including biomedical engineering, radioelectronics, microelectronics, information technology, smart energy, information security and others. ...

Added: July 8, 2026

Моделирование специализированных алгоритмов маршрутизации в сетях на кристалле, представленных сериями семейств циркулянтных топологий

Маликов М. А., Монахова Э. А., Rzaev E. et al., Ученые записки Казанского университета. Серия: Физико-математические науки 2026 Т. 168 № 2 С. 269–286

This article examines series of families of two-dimensional circulant networks with rectangular L -shapes, optimal in diameter, as network-on-chip topologies with a minimal number of crossings between the links and a bounded length of the maximum link that does not depend on the network size. New network-on-chip routing algorithms, which use the coordinates of three adjacent zeros in the ...

Added: July 8, 2026

Algorithmic overlaps as thermodynamic variables: From local to cluster Monte Carlo dynamics in critical phenomena

Pilé I., Deng Y., Shchur L., Physical Review B: Condensed Matter and Materials Physics 2026 Vol. 114 No. 1 Article 014101

We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Swendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for models in different universality classes: Ising model, Potts model with three components, and four-state Potts model. The overlap of two ...

Added: July 6, 2026

Комитет цензуры иностранной как институт культурного трансфера, или судьба итальянских книг и переводов с итальянского в цензурных документах 1830–1850-х годов

Bodrova A. S., Guskov S., Studi Slavistici 2026 Т. 23 № 1 С. 197–212

The article investigates foreign censorship as an institution of cultural transfer in the Russian Empire and its impact on the reception of Italian literature between the 1830s and 1850s. Drawing on archival materials, the authors demonstrate that censorship decisions were determined not only by the norms of the Censorship Statute (1828) but also by a ...

Added: July 5, 2026

A Russian Translation of the BRIEF2 Disproportionately Flags Typical Russian and Previously Institutionalized Individuals on Validity Scales

Chinn L., Momotenko D., Григоренко Е. Л., Клиническая и специальная психология 2022 Vol. 11 No. 2 P. 138–157

The Behavior Rating Inventory of Executive Function (BRIEF) is a commonly used tool for researchers and clinicians to assess executive functioning, especially in individuals with learning or other developmental disorders. Although it has been translated and used in multiple countries, the BRIEF has only been officially normed by its manufacturers in U.S. samples. In order ...

Added: June 29, 2026