• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Here We Go Again: Modern GEC Models Need Help with Spelling
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
July 2, 2026
Researchers Discover How Spelling Errors Slow Down Reading in Russian
Psycholinguists from the Centre for Language and Brain at HSE University–St Petersburg have shown that words that are frequently misspelled are processed more slowly by readers, even when presented with the correct spelling. The researchers confirmed this effect for the first time using Russian-language materials and found that response speed is most strongly linked to how confidently individuals can distinguish the correct spelling of a word from an incorrect one. The study has been published in The Mental Lexicon.
July 2, 2026
HSE Develops App for Assessing Phonological Processing in Children
Researchers at the HSE Centre for Language and Brain have developed a new digital tool for assessing children's phonological processing skills—the ZARYA (Sound Analysis of the Russian Language) test battery. It is the first standardised application in Russia designed to provide a fast and reliable assessment of children's ability to distinguish speech sounds, retain them in working memory, and perform phonemic analysis. The app runs on Android tablets and smartphones and is available for download from RuStore. Details of the test validation have been published in the Journal of Speech, Language, and Hearing Research.
July 1, 2026
Scientists Discover Why Europium 'Misbehaves'
Europium is a rare-earth metal responsible for the pure red glow in displays and other luminescent materials. For a long time, however, it refused to emit light when surrounded by certain organic molecules known as acylpyrazolone ligands. Chemists have now uncovered the reason: in europium complexes with these ligands, a 'black window' appears—a charge-transfer state in which the energy absorbed by the ligand is dissipated as heat rather than emitted as light. Understanding this mechanism opens the way to designing more efficient red-emitting materials for displays, fluorescent thermometers, and chemical sensors. The results have been published in Dalton Transactions.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Here We Go Again: Modern GEC Models Need Help with Spelling

Proceedings of the Institute for System Programming of the RAS. 2023. Vol. 35. No. 5. P. 215–228.
Starchenko V., Starchenko A.

The study focuses on how modern GEC systems handle character-level errors. We discuss the ways these errors effect the performance of models and test how models of different architectures handle them. We conclude that specialized GEC systems do struggle against correcting non-existent words, and that a simple spellchecker considerably improve overall performance of a model. To evaluate it, we assess the models over several datasets. In addition to CoNLL-2014 validation dataset, we contribute a synthetic dataset with higher density of character-level errors and conclude that, provided that models generally show very high scores, validation datasets with higher density of tricky errors are a useful tool to compare models. Lastly, we notice cases of incorrect treatment of non-existent words on experts' annotation and contribute a cleared version of this dataset. In contrast to specialized GEC systems, LLaMA model used for GEC task handles character-level errors well. We suggest that this better performance is explained by the fact that Alpaca is not extensively trained on annotated texts with errors, but gets as input grammatically and orthographically correct texts.

Research target: Philology and Linguistics Computer Science
Language: English
DOI
Text on another site
Keywords: validationвалидацияпредобработкаpreprocessingспеллчекерGECspellchecksynthetic datasetsисправление грамматических ошибоксинтетические датасеты
Publication based on the results of:
Constituent structure and constituents' interpretation in the grammar architecture of the languages of Russian (2023)
Similar publications
ПИНДАР. ПИФИЙСКАЯ ОДА 9.33–43: О ЧЕМ ГОВОРИТ ХИРОН?
Akhunova O., Индоевропейское языкознание и классическая филология 2026 Т. 30 № 1 С. 108–119
There is a scene in Pindar’s Pythian 9 that attracts much attention of scholars, not only because the erotic theme in general is unusual for Pindar, but also because in this scene neither the question that Apollo addresses Chiron, nor the answer that Chiron gives him, can be unambiguously interpreted. Does Apollo intend to commit open violence against Cyrene, or ...
Added: July 1, 2026
Concepts of searching and finding: principles of colexification in a typological perspective
Reznikova T., Rakhilina E. V., Ryzhova D. et al., Lingua 2026 Vol. 341
The article examines lexification of the semantic domains of searching and finding based on a sample of 25+ languages. First, it discusses the semantic parameters underlying lexical oppositions within each of the domains (e.g., type of the subject and referentiality of the object, for searching; intentionality and animacy of the object, for finding). Second, it ...
Added: July 1, 2026
Language policy in multiethnic countries: Current trends
Bergelson M., Grenoble L., Russian Journal of Linguistics 2026 Vol. 30 No. 2 P. 275–309
This introductory article surveys current theoretical and methodological trends in language policy research in multilingual and multiethnic societies, with particular attention to the post-Soviet space and the Russian Federation. Drawing on structural, critical, ecological, and urban sociolinguistic approaches, the paper traces the evolution of language policy scholarship from early language planning models to contemporary frameworks emphasizing multilingualism, globalization, social inequality, ...
Added: June 30, 2026
LANGUAGE POLICY IN MULTIETHNIC COUNTRIES
-, 2026.
The papers in this thematic volume demonstrate that language policy in the post-Soviet space and elsewhere reveals a fundamental tension that mirrors global shifts: the conflict between state efforts to manage national identity and the organic reality of human communication. While regional nationalization efforts often demonstrate global patterns of securitization, the actual practices of speakers tell a different story. Language policy ...
Added: June 30, 2026
ПРОДАННЫЙ ПРАЗДНИК, УКРАДЕННАЯ ЧАСОВНЯ, ПРОИГРАННЫЙ ПРИХОД: ДЕРЕВЕНСКИЙ ПРАЗДНИК КАК СИМВОЛИЧЕСКИЙ КАПИТАЛ
Moroz A., Антропологический форум 2026 Т. 69 С. 296–324
Some rather unusual stories have been recorded from time to time in various Russian regions: about one village that sold its holiday to another, about the residents of one village who stole a chapel from another one and transported it to their own village, or how a rural priest gambled away part of his parish ...
Added: June 30, 2026
VIII Международный научный конгресс (7–8 апреля 2023 г.) / Филология. Социальная и национальная вариативность языка и литературы : материалы VIII Международного научного конгресса Симферополь, Издательский дом КФУ им. В. И. Вернадского, 2023. ISBN: 978-5-605-02308-1
Издательский дом КФУ им. В. И. Вернадского, 2023.
В сборнике представлены статьи по докладам участников VIII Международного научного конгресса «Филология. Социальная и национальная вариативность языка и литературы», который проходил в г. Симферополь 7 – 8 апреля 2023 г. В представленных публикациях рассматриваются актуальные проблемы социолингвистики, социофонетики и фоностилистики, индоевропеистики, литературоведения, языкознания и корпусной лингвистики, коммуникативистики и прагмалингвистики, лингводидактики, библиотечного обслуживания, диалога культур и ...
Added: June 30, 2026
I Международная научно-образовательная конференция «Пейсиковские чтения: проблемы современного академического востоковедения»: материалы конференции
М.: ИСАА МГУ имени М.В. Ломоносова, 2023.
Издание представляет собой сборник материалов I Международной научно-образовательной конференция «Пейсиковские чтения: проблемы современного академического востоковедения», проведённой 21 апреля 2023 года в ИСАА МГУ имени М.В. Ломоносова. В книге представлены работы сотрудников Института и приглашённых специалистов из ряда ведущих институтов России и зарубежных стран Сборник в электронном виде можно скачать по ссылке http://iranistika.iling-ran.ru/Sbornik/ ...
Added: June 30, 2026
Великие империи Древнего Ирана: новый аутентичный мультимедийный учебный комплекс
Gromova A., Научный вестник Крыма (Россия, ISSN: 2499-9911) 2021 № 2 (31) С. 1–13
The Iranian ‘Teleschool’ that was launched in 2020 on the base of standard schoolbooks published by the Ministry of Education, reflects the common vision of the glorious history of Ancient Iran and promotes the national cultural heritage. The present article aims to describe a comprehensive selection of new learning materials such as original texts and ...
Added: June 30, 2026
Традиции Ноуруза в Даване, Иран: праздничные сладости и весенние стихи
Gromova A., Армянский гуманитарный вестник 2022 № 8 С. 267–275
The article describes the local customs of celebrating the Iranian New Year in Davan, an ancient village in the province of Fars, Iran, known for its unique landscape and archaic dialect. Some of the traditions that exist here can be attributed to all-Iranian seasonal practices, however, certain culinary traditions and sweets (for example, popcorn rice ...
Added: June 30, 2026
Литературный круг Михаила Кузмина: границы – уровни – прагматика
Pakhomova A., Quaestio Rossica 2026 Т. 14 № 2 С. 389–405
This paper examines the structural and pragmatic characteristics of the literary circle (Rus. литературный круг), a form of literary cooperation that has rarely been the subject of independent analysis, particularly when compared with other forms of writers’ associations (such as clubs, salons, and groups). The main set of issues associated with the literary circle lies ...
Added: June 30, 2026
Иран и его соседи
Gromova A., М.: КноРус, 2023.
Учебное пособие по лингвострановедению предназначено для востоковедов, изучающих персидский язык в рамках различных специализаций: регионоведение, филология, история и политология, экономическое развитие стран Ближнего и Среднего Востока. Пособие знакомит с реалиями современной иранской жизни и национальными новостными ресурсами, широко использует материалы Интернета. Книга оставляет известную свободу в выборе материала для занятий в зависимости от уровня владения ...
Added: June 29, 2026
A Russian Translation of the BRIEF2 Disproportionately Flags Typical Russian and Previously Institutionalized Individuals on Validity Scales
Chinn L., Momotenko D., Григоренко Е. Л., Клиническая и специальная психология 2022 Vol. 11 No. 2 P. 138–157
The Behavior Rating Inventory of Executive Function (BRIEF) is a commonly used tool for researchers and clinicians to assess executive functioning, especially in individuals with learning or other developmental disorders. Although it has been translated and used in multiple countries, the BRIEF has only been officially normed by its manufacturers in U.S. samples. In order ...
Added: June 29, 2026
О генезисе жанра прозаического гимна в литературе Второй софистики в кн.: ПОЭТИКА БОГООБЩЕНИЯ: МИСТИЧЕСКИЕ ХРИСТИАНСКИЕ ТЕКСТЫ ОТ ПОЗДНЕЙ АНТИЧНОСТИ ДО XX ВЕКА
Межерицкая С. И., М.: Аквилон, 2024.
Настоящее исследование посвящено изучению и описанию жанра прозаического гимна, определению его места в системе жанров эпидейктического красноречия, а также генезису и развитию в позднеантичной риторической традиции. Оба вопроса — природа и становление данного жанра — тесно взаимосвязаны. Так, с одной стороны, полная характеристика прозаического гимна возможна только при условии его сопоставления с гимном поэтическим — древнейшим жанром древнегреческой хоровой ...
Added: June 29, 2026
Tradition and innovation in ancient Greek oratory of the Roman Empire: History of the problem
Межерицкая С. И., Scrinium: Journal of Patrology and Critical Hagiography 2022 Vol. 18 P. 453–468
This article presents a review of research literature on the so-called Second Sophistic (late first – early third centuries CE), that marked the flowering of ancient Greek oratory and had a powerful influence on the beginning of the Christian eloquence. The scholars’ interest in this topic increased in the second half of the 19th century due to insufficient ...
Added: June 29, 2026
ЛАТИНСКИЕ ПЕРЕВОДЫ РЕЧЕЙ ДИОНА ХРИСОСТОМА В РЕНЕССАНСНОМ ИЗДАНИИ КАРЛО ВАЛЬГУЛИО
Межерицкая С. И., Индоевропейское языкознание и классическая филология 2026 Т. XXX № 2 С. 102–120
The article is devoted to the history and circumstances surrounding the appearance of the first Latin editions of the political speeches of Dio Chrysostom, produced in the late 15th century by Italian humanists, which marked the beginning of centuries-long study and popularization of the works of this major ancient Greek orator and sophist in Europe. ...
Added: June 29, 2026
Путешествие по старинным городам Ирана
Gromova A., КноРус, 2026.
Предназначено для востоковедов, изучающих персидский язык в рамках различных специализаций, таких как история, политология, регионоведение, филология, экономическое развитие стран Ближнего и Среднего Востока. Рассчитано на студентов востоковедных вузов второго курса, знакомит с основными этапами истории Древнего Ирана, туристским потенциалом этой многоликой страны, с национальными новостными и тематическими онлайн-ресурсами. Книга оставляет известную свободу в выборе материала ...
Added: June 29, 2026
ОСОБЕННОСТИ УПОТРЕБЛЕНИЯ ПОЛИТИЧЕСКОЙ ТЕРМИНОЛОГИИ В «ВИФИНСКИХ РЕЧАХ» ДИОНА ХРИСОСТОМА
Межерицкая С. И., Индоевропейское языкознание и классическая филология 2025 Т. XXIX № 2 С. 40–55
In political speeches of Dio Chrysostom, in particular in his so-called “Bithynian speeches”, specific vocabulary is widely used, which includes the words such as ὁμόνοια (agreement), εἰρήνη ‘peace’, στάσις ‘discord’, ἔχθρα ‘enmity’, φιλονικία ‘rivalry’, etc. However, the specific meaning of these terms depends on the context and can vary significantly in different speeches of Dio. For example, his ...
Added: June 28, 2026
Между ересью и ортодоксией: языковая игра и интерпретация текста в средневековых латинских сочинениях о Немо
Межерицкая С. И., Шаги/Steps 2026 Т. 12 № 2 С. 197–206
The article examines two cases of parodic reinterpretation of biblical texts through linguistic play on the negative pronoun nemo ‘nobody’, as reflected in medieval religious Latin literature. The author discusses two works from the second half of the 13th century: Historia de Nemine, attributed to a certain learned French monk named Radulph, and Reprobatio nefandi sermonis editi per Radulphum de quodam ...
Added: June 28, 2026
The 12th International Conference on Information Technology and Quantitative Management (ITQM 2025)
Netherlands: ScienceDirect, 2025.
No ...
Added: June 28, 2026
Correcting or Rewriting? An Expert Evaluation of LLM-Based GEC on Academic Learner Data
Копылова Е. В., Tsegoeva O. G., Берлин В. А. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.Issue 24.: M.: Max press, 2026. P. 1–10.
This paper investigates how large language models correct complex grammatical errors in Russian academic learner writing. Unlike traditional minimal-edit GEC systems, LLMs often apply generative rewriting strategies that may improve fluency, but risk structural overcorrection and semantic drift. We introduce a new expert benchmark derived from an authentic 3,1M-word learner corpus and construct an evaluation set annotated for ...
Added: June 27, 2026
Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.
M.: Max press, 2026.
The volume includes 64 papers from the international conference on computational linguistics and intelligent technologies 'Dialogue 2026,' representing a broad spectrum of theoretical and applied research in the field of natural language description, language process modeling, and the development of practically applicable computational linguistic technologies. For specialists in theoretical and applied linguistics and intelligent technologies. ...
Added: June 27, 2026
The recognition-by-components method
Slivnitsin P., Mylnikov L., Engineering Applications of Artificial Intelligence 2026 Vol. 179 Article 115185
The paper describes a applied artificial intelligence task of recognition-by-components method of real objects based on the recognition of a limited set of primitives or components. The recognition-by-components makes it possible to determine the components, that compose an object, and increase the number of recognizable objects without degrading the recognition quality. Training is performed on ...
Added: May 29, 2026
Русскоязычная версия Шкалы экотревожности Хогг (HEAS-RU)
Nartova-Bochaver S. K., Stakina Y., Тренина М. Е. et al., Клиническая и специальная психология 2026 Т. 15 № 1 С. 166–181
Context and relevance. Eco-anxiety is the anxiety arising in connection with real and possible natural changes and disasters. Eco-anxiety is a significant destabilizer of human activity and therefore needs to be monitored or intervened, which requires a tool to assess its severity. Objective. The present study is aimed at adapting the Hogg Eco-Anxiety Scale (HEAS) ...
Added: April 18, 2026
Statistically distinguishable rating scales
Pomazanov M. V., The Journal of Risk Model Validation 2026 Vol. 20 No. 1 P. 1–24
This paper proposes a method of designing a statistically distinguishable rating scale that is not excessive in relation to the existing observation statistics. This allows for more stable validation with a fixed maximum number of violations of the Wald criterion compared with the excess scales usually used by banks. The increased validation robustness will reduce the calibration probability of ...
Added: December 9, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit