Here We Go Again: Modern GEC Models Need Help with Spelling

Starchenko V.; A. Starchenko

doi:10.15514/ISPRAS-2022-35(5)-14

Publications

?

Here We Go Again: Modern GEC Models Need Help with Spelling

Proceedings of the Institute for System Programming of the RAS. 2023. Vol. 35. No. 5. P. 215–228.

Starchenko V., Starchenko A.

The study focuses on how modern GEC systems handle character-level errors. We discuss the ways these errors effect the performance of models and test how models of different architectures handle them. We conclude that specialized GEC systems do struggle against correcting non-existent words, and that a simple spellchecker considerably improve overall performance of a model. To evaluate it, we assess the models over several datasets. In addition to CoNLL-2014 validation dataset, we contribute a synthetic dataset with higher density of character-level errors and conclude that, provided that models generally show very high scores, validation datasets with higher density of tricky errors are a useful tool to compare models. Lastly, we notice cases of incorrect treatment of non-existent words on experts' annotation and contribute a cleared version of this dataset. In contrast to specialized GEC systems, LLaMA model used for GEC task handles character-level errors well. We suggest that this better performance is explained by the fact that Alpaca is not extensively trained on annotated texts with errors, but gets as input grammatically and orthographically correct texts.

Research target: Philology and Linguistics Computer Science

Language: English

DOI

Text on another site

Keywords: validation валидация предобработка preprocessing спеллчекер GEC spellcheck synthetic datasets исправление грамматических ошибок синтетические датасеты

Publication based on the results of:

Constituent structure and constituents' interpretation in the grammar architecture of the languages of Russian (2023)

ПИНДАР. ПИФИЙСКАЯ ОДА 9.33–43: О ЧЕМ ГОВОРИТ ХИРОН?

Akhunova O., Индоевропейское языкознание и классическая филология 2026 Т. 30 № 1 С. 108–119

There is a scene in Pindar’s Pythian 9 that attracts much attention of scholars, not only because the erotic theme in general is unusual for Pindar, but also because in this scene neither the question that Apollo addresses Chiron, nor the answer that Chiron gives him, can be unambiguously interpreted. Does Apollo intend to commit open violence against Cyrene, or ...

Added: July 1, 2026

Concepts of searching and finding: principles of colexification in a typological perspective

Reznikova T., Rakhilina E. V., Ryzhova D. et al., Lingua 2026 Vol. 341

The article examines lexification of the semantic domains of searching and finding based on a sample of 25+ languages. First, it discusses the semantic parameters underlying lexical oppositions within each of the domains (e.g., type of the subject and referentiality of the object, for searching; intentionality and animacy of the object, for finding). Second, it ...

Added: July 1, 2026

Language policy in multiethnic countries: Current trends

Bergelson M., Grenoble L., Russian Journal of Linguistics 2026 Vol. 30 No. 2 P. 275–309

This introductory article surveys current theoretical and methodological trends in language policy research in multilingual and multiethnic societies, with particular attention to the post-Soviet space and the Russian Federation. Drawing on structural, critical, ecological, and urban sociolinguistic approaches, the paper traces the evolution of language policy scholarship from early language planning models to contemporary frameworks emphasizing multilingualism, globalization, social inequality, ...

Added: June 30, 2026

LANGUAGE POLICY IN MULTIETHNIC COUNTRIES

-, 2026.

The papers in this thematic volume demonstrate that language policy in the post-Soviet space and elsewhere reveals a fundamental tension that mirrors global shifts: the conflict between state efforts to manage national identity and the organic reality of human communication. While regional nationalization efforts often demonstrate global patterns of securitization, the actual practices of speakers tell a different story. Language policy ...

Added: June 30, 2026

ПРОДАННЫЙ ПРАЗДНИК, УКРАДЕННАЯ ЧАСОВНЯ, ПРОИГРАННЫЙ ПРИХОД: ДЕРЕВЕНСКИЙ ПРАЗДНИК КАК СИМВОЛИЧЕСКИЙ КАПИТАЛ

Moroz A., Антропологический форум 2026 Т. 69 С. 296–324

Some rather unusual stories have been recorded from time to time in various Russian regions: about one village that sold its holiday to another, about the residents of one village who stole a chapel from another one and transported it to their own village, or how a rural priest gambled away part of his parish ...

Added: June 30, 2026

VIII Международный научный конгресс (7–8 апреля 2023 г.) / Филология. Социальная и национальная вариативность языка и литературы : материалы VIII Международного научного конгресса Симферополь, Издательский дом КФУ им. В. И. Вернадского, 2023. ISBN: 978-5-605-02308-1

Издательский дом КФУ им. В. И. Вернадского, 2023.

В сборнике представлены статьи по докладам участников VIII Международного научного конгресса «Филология. Социальная и национальная вариативность языка и литературы», который проходил в г. Симферополь 7 – 8 апреля 2023 г. В представленных публикациях рассматриваются актуальные проблемы социолингвистики, социофонетики и фоностилистики, индоевропеистики, литературоведения, языкознания и корпусной лингвистики, коммуникативистики и прагмалингвистики, лингводидактики, библиотечного обслуживания, диалога культур и ...

Added: June 30, 2026

I Международная научно-образовательная конференция «Пейсиковские чтения: проблемы современного академического востоковедения»: материалы конференции

М.: ИСАА МГУ имени М.В. Ломоносова, 2023.

Издание представляет собой сборник материалов I Международной научно-образовательной конференция «Пейсиковские чтения: проблемы современного академического востоковедения», проведённой 21 апреля 2023 года в ИСАА МГУ имени М.В. Ломоносова. В книге представлены работы сотрудников Института и приглашённых специалистов из ряда ведущих институтов России и зарубежных стран Сборник в электронном виде можно скачать по ссылке http://iranistika.iling-ran.ru/Sbornik/ ...

Added: June 30, 2026

Великие империи Древнего Ирана: новый аутентичный мультимедийный учебный комплекс

Gromova A., Научный вестник Крыма (Россия, ISSN: 2499-9911) 2021 № 2 (31) С. 1–13

The Iranian ‘Teleschool’ that was launched in 2020 on the base of standard schoolbooks published by the Ministry of Education, reflects the common vision of the glorious history of Ancient Iran and promotes the national cultural heritage. The present article aims to describe a comprehensive selection of new learning materials such as original texts and ...

Added: June 30, 2026

Традиции Ноуруза в Даване, Иран: праздничные сладости и весенние стихи

Gromova A., Армянский гуманитарный вестник 2022 № 8 С. 267–275

The article describes the local customs of celebrating the Iranian New Year in Davan, an ancient village in the province of Fars, Iran, known for its unique landscape and archaic dialect. Some of the traditions that exist here can be attributed to all-Iranian seasonal practices, however, certain culinary traditions and sweets (for example, popcorn rice ...

Added: June 30, 2026

Литературный круг Михаила Кузмина: границы – уровни – прагматика

Pakhomova A., Quaestio Rossica 2026 Т. 14 № 2 С. 389–405

This paper examines the structural and pragmatic characteristics of the literary circle (Rus. литературный круг), a form of literary cooperation that has rarely been the subject of independent analysis, particularly when compared with other forms of writers’ associations (such as clubs, salons, and groups). The main set of issues associated with the literary circle lies ...

Added: June 30, 2026

Иран и его соседи

Gromova A., М.: КноРус, 2023.

Учебное пособие по лингвострановедению предназначено для востоковедов, изучающих персидский язык в рамках различных специализаций: регионоведение, филология, история и политология, экономическое развитие стран Ближнего и Среднего Востока. Пособие знакомит с реалиями современной иранской жизни и национальными новостными ресурсами, широко использует материалы Интернета. Книга оставляет известную свободу в выборе материала для занятий в зависимости от уровня владения ...

Added: June 29, 2026

A Russian Translation of the BRIEF2 Disproportionately Flags Typical Russian and Previously Institutionalized Individuals on Validity Scales

Chinn L., Momotenko D., Григоренко Е. Л., Клиническая и специальная психология 2022 Vol. 11 No. 2 P. 138–157

The Behavior Rating Inventory of Executive Function (BRIEF) is a commonly used tool for researchers and clinicians to assess executive functioning, especially in individuals with learning or other developmental disorders. Although it has been translated and used in multiple countries, the BRIEF has only been officially normed by its manufacturers in U.S. samples. In order ...

Added: June 29, 2026

О генезисе жанра прозаического гимна в литературе Второй софистики в кн.: ПОЭТИКА БОГООБЩЕНИЯ: МИСТИЧЕСКИЕ ХРИСТИАНСКИЕ ТЕКСТЫ ОТ ПОЗДНЕЙ АНТИЧНОСТИ ДО XX ВЕКА

Межерицкая С. И., М.: Аквилон, 2024.

Настоящее исследование посвящено изучению и описанию жанра прозаического гимна, определению его места в системе жанров эпидейктического красноречия, а также генезису и развитию в позднеантичной риторической традиции. Оба вопроса — природа и становление данного жанра — тесно взаимосвязаны. Так, с одной стороны, полная характеристика прозаического гимна возможна только при условии его сопоставления с гимном поэтическим — древнейшим жанром древнегреческой хоровой ...

Added: June 29, 2026

Tradition and innovation in ancient Greek oratory of the Roman Empire: History of the problem

Межерицкая С. И., Scrinium: Journal of Patrology and Critical Hagiography 2022 Vol. 18 P. 453–468

This article presents a review of research literature on the so-called Second Sophistic (late first – early third centuries CE), that marked the flowering of ancient Greek oratory and had a powerful influence on the beginning of the Christian eloquence. The scholars’ interest in this topic increased in the second half of the 19th century due to insufficient ...

Added: June 29, 2026

ЛАТИНСКИЕ ПЕРЕВОДЫ РЕЧЕЙ ДИОНА ХРИСОСТОМА В РЕНЕССАНСНОМ ИЗДАНИИ КАРЛО ВАЛЬГУЛИО

Межерицкая С. И., Индоевропейское языкознание и классическая филология 2026 Т. XXX № 2 С. 102–120

The article is devoted to the history and circumstances surrounding the appearance of the first Latin editions of the political speeches of Dio Chrysostom, produced in the late 15th century by Italian humanists, which marked the beginning of centuries-long study and popularization of the works of this major ancient Greek orator and sophist in Europe. ...

Added: June 29, 2026

Путешествие по старинным городам Ирана

Gromova A., КноРус, 2026.

Предназначено для востоковедов, изучающих персидский язык в рамках различных специализаций, таких как история, политология, регионоведение, филология, экономическое развитие стран Ближнего и Среднего Востока. Рассчитано на студентов востоковедных вузов второго курса, знакомит с основными этапами истории Древнего Ирана, туристским потенциалом этой многоликой страны, с национальными новостными и тематическими онлайн-ресурсами. Книга оставляет известную свободу в выборе материала ...

Added: June 29, 2026

ОСОБЕННОСТИ УПОТРЕБЛЕНИЯ ПОЛИТИЧЕСКОЙ ТЕРМИНОЛОГИИ В «ВИФИНСКИХ РЕЧАХ» ДИОНА ХРИСОСТОМА

Межерицкая С. И., Индоевропейское языкознание и классическая филология 2025 Т. XXIX № 2 С. 40–55

In political speeches of Dio Chrysostom, in particular in his so-called “Bithynian speeches”, specific vocabulary is widely used, which includes the words such as ὁμόνοια (agreement), εἰρήνη ‘peace’, στάσις ‘discord’, ἔχθρα ‘enmity’, φιλονικία ‘rivalry’, etc. However, the specific meaning of these terms depends on the context and can vary significantly in different speeches of Dio. For example, his ...

Added: June 28, 2026

Между ересью и ортодоксией: языковая игра и интерпретация текста в средневековых латинских сочинениях о Немо

Межерицкая С. И., Шаги/Steps 2026 Т. 12 № 2 С. 197–206

The article examines two cases of parodic reinterpretation of biblical texts through linguistic play on the negative pronoun nemo ‘nobody’, as reflected in medieval religious Latin literature. The author discusses two works from the second half of the 13th century: Historia de Nemine, attributed to a certain learned French monk named Radulph, and Reprobatio nefandi sermonis editi per Radulphum de quodam ...

Added: June 28, 2026

The 12th International Conference on Information Technology and Quantitative Management (ITQM 2025)

Netherlands: ScienceDirect, 2025.

No ...

Added: June 28, 2026

Correcting or Rewriting? An Expert Evaluation of LLM-Based GEC on Academic Learner Data

Копылова Е. В., Tsegoeva O. G., Берлин В. А. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.Issue 24.: M.: Max press, 2026. P. 1–10.

This paper investigates how large language models correct complex grammatical errors in Russian academic learner writing. Unlike traditional minimal-edit GEC systems, LLMs often apply generative rewriting strategies that may improve fluency, but risk structural overcorrection and semantic drift. We introduce a new expert benchmark derived from an authentic 3,1M-word learner corpus and construct an evaluation set annotated for ...

Added: June 27, 2026

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.

M.: Max press, 2026.

The volume includes 64 papers from the international conference on computational linguistics and intelligent technologies 'Dialogue 2026,' representing a broad spectrum of theoretical and applied research in the field of natural language description, language process modeling, and the development of practically applicable computational linguistic technologies. For specialists in theoretical and applied linguistics and intelligent technologies. ...

Added: June 27, 2026

The recognition-by-components method

Slivnitsin P., Mylnikov L., Engineering Applications of Artificial Intelligence 2026 Vol. 179 Article 115185

The paper describes a applied artificial intelligence task of recognition-by-components method of real objects based on the recognition of a limited set of primitives or components. The recognition-by-components makes it possible to determine the components, that compose an object, and increase the number of recognizable objects without degrading the recognition quality. Training is performed on ...

Added: May 29, 2026

Русскоязычная версия Шкалы экотревожности Хогг (HEAS-RU)

Nartova-Bochaver S. K., Stakina Y., Тренина М. Е. et al., Клиническая и специальная психология 2026 Т. 15 № 1 С. 166–181

Context and relevance. Eco-anxiety is the anxiety arising in connection with real and possible natural changes and disasters. Eco-anxiety is a significant destabilizer of human activity and therefore needs to be monitored or intervened, which requires a tool to assess its severity. Objective. The present study is aimed at adapting the Hogg Eco-Anxiety Scale (HEAS) ...

Added: April 18, 2026

Statistically distinguishable rating scales

Pomazanov M. V., The Journal of Risk Model Validation 2026 Vol. 20 No. 1 P. 1–24

This paper proposes a method of designing a statistically distinguishable rating scale that is not excessive in relation to the existing observation statistics. This allows for more stable validation with a fixed maximum number of violations of the Wald criterion compared with the excess scales usually used by banks. The increased validation robustness will reduce the calibration probability of ...

Added: December 9, 2025