• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Distractor Generation for Lexical Questions Using Learner Corpus Data
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Distractor Generation for Lexical Questions Using Learner Corpus Data

Jazykovedny Casopis. 2023. Vol. 74. No. 1. P. 345–356.
Nikita Login

Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of this paper, is to generate distractors for the retrieved questions. The presented approach (called DisSelector) is based on supervised learning on specially annotated learner corpus data. For each sentence a list of distractor candidates was retrieved. Then, each candidate was manually labelled as a plausible or implausible distractor. The derived set of examples was additionally filtered by a set of lexical and grammatical rules and then split into training and testing subsets in 4:1 ratio. Several classification models, including classical machine learning algorithms and gradient boosting implementations, were trained on the data. Word and sentence vectors from language models together with corpus word frequencies were used as input features for the classifiers. The highest F1-score (0.72) was attained by a XGBoost model. Various configurations of DisSelector showed improvements over the unsupervised baseline in both automatic and expert evaluation. DisSelector was integrated into an opensource language testing platform LangExBank as a microservice with a REST API.

Research target: Philology and Linguistics
Language: English
Full text
DOI
Text on another site
Keywords: учебный корпусlearner corporadistractor generationautomated question generationгенерация неправильных вариантов ответаавтоматизированная генерация вопросов
Publication based on the results of:
Second-language acquisition modelling within different frameworks of existing theories on the basis of learner corpora platforms for experiments and computer tools (2023)
Similar publications
Juxtapositional vs. possessive-like encoding in Russian specificational constructions
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
КОГНИТИВНО-АССОЦИАТИВНОЕ ПОЛЕ ОНИМОВ САНКТ-ПЕТЕРБУРГА И ВЕНЫ
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 № 32(1) С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
«Плоский мир» Т. Пратчетта глазами русскоязычного фандома
Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173
Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...
Added: May 10, 2026
Вселенная Достоевского
Pershkina A., М.: Альпина нон-фикшн, 2026.
Филолог Анастасия Першкина рассказывает о том, как писатель создавал свой мир, кем его населил, какие законы установил и почему этот мир так ярко действует на нас. Кроме того, вы узнаете, кто помогал Федору Михайловичу работать, как писатель связывал между собой произведения, что думали о его текстах современники и что же такое достоевщина. ...
Added: May 6, 2026
The hypothesis of dependence of the lexical nature of mixed languages on the patterns of their emergence
Gridneva E., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2026 No. 100 P. 38–52
This study investigates mixed languages, with a specific focus on their lexical characteristics. It proposes and substantiates the hypothesis that the degree of lexical mixing in such languages — reflected in the prevalence of doublets and the distribution of vocabulary between source languages — is linked to the specific pattern of their emergence, rather than ...
Added: May 6, 2026
Арест писателя Гюнтера Хофе на франкфуртской книжной ярмарке в 1963 г.: конкурирующие образы в медийном пространстве ГДР и ФРГ
Керимов Р. Э., Новое прошлое 2026 № 1 С. 148–162
The arrest of East German writer and publishing director Günter Hofé at the 1963 Frankfurt Book Fair became a unique episode of ideological confrontation between East and West Germany. Hofé is primarily known for his documentary-fiction trilogy about World War II, in which he actively participated as a Wehrmacht soldier. The analysis of the writer’s ...
Added: May 5, 2026
Семантический ореол сакрального в четырехстопном амфибрахии: механизмы культурной памяти в поэзии Ольги Седаковой
Максимов И. В., Новый филологический вестник 2025 Т. 73 № 2 С. 187–196
The majority of studies on the metrical aspects of Olga Sedakova’s poetry focus on the formal elements of versification, rarely exploring the substantive possibilities of the chosen metres. This paper fills this gap by analyzing the unified narrative of the four-foot amphibrach, tracing its development in Russian poetry from V.A. Zhukovsky to O.A. Sedakova. At ...
Added: May 5, 2026
L1 Influence on the Use of the English Present Perfect: A Corpus Analysis of Russian and Spanish Learners’ Essays
Perez-Guerra J., Smirnova E. A., Journal of Language and Education 2024 Vol. 10 No. 1 P. 101–114
Mastering verbal tenses, especially those expressing aspect, in a second language presents a challenge as learners frequently link the semantic nuances of verbal forms in their second language (L2) to the characteristics of the verbal systems in their native languages (L1). This study explores the impact of L1 on the usage of the English Present ...
Added: March 3, 2024
Обработка слов с частотными орфографическими ошибками (исследование на базе учебного корпуса английского языка)
Klimova M., Viklova A., Overnikova D., Вестник Санкт-Петербургского университета. Язык и литература 2023 Т. 20 № 4 С. 824–837
The article presents an experimental study of the influence of the frequency of spelling errors in a word on its representation in mental lexicon. The hypothesis that frequently misspelled words cause difficulties in reading even if they are written correctly has been proved for native speakers of Russian and English. This paper aims to check ...
Added: January 26, 2024
Устный учебный корпус РКИ: новый источник данных для лингвистических и методических исследований
Vlasova E., Бец Ю. В., Северина Е. М., В кн.: «Русская грамматика в диалоге научных школ, направлений, методов».: Владивосток: Издательство ДВФУ, 2022.
В статье анализируются нетривиальные фонетические и грамматические явления устной речи иностранцев, изучающих русский язык. Показано, что устный учебный корпус позволяет получить систематическое представление о компенсаторных механизмах речепорождения, проверять и формулировать гипотезы. ...
Added: November 8, 2023
Аннотирование учебного корпуса в аспекте его использования для исследовательских задач
Klimova M., Viklova A., Overnikova D., В кн.: Современная лингвистика: от теории к практике. III Казанский международный лингвистический саммит (Казань, 14–19 ноября 2022 г.): Труды и материалы, в трёх томах, том 1.: Каз.: Издательство Казанского университета, 2022. С. 46–50.
В данной статье рассматривается классификация ошибок, используемая в учебном корпусе REALEC, в аспекте ее соответствия требованиям и приспособленности для исследовательских задач. ...
Added: January 17, 2023
Review of Practices of Collecting and Annotating Texts in the Learner Corpus REALEC
Vinogradova O. I., Lyashevskaya O., , in: Text, Speech, and Dialogue. 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings Lecture Notes in Computer Science (LNAI), vol. 13502Vol. 13502.: Cham: Springer Publishing Company, 2022. P. 77–88.
REALEC, learner corpus released in the open access, had received 6,054 essays written in English by HSE undergraduate students in their English university-level examination by the year 2020. This paper reports on the data collection and manual annotation approaches for the texts of 2014–2019 and discusses the computer tools available for working with the corpus. ...
Added: October 5, 2022
Word-formation complexity: a learner corpus-based study
Lyashevskaya O., Pyzhak J.V., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471–492
This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...
Added: October 5, 2022
Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons
M. A. Klimova, V. K. Smilga, D. A. Overnikova, , in: Труды международной конференции «Корпусная лингвистика–2021».: Скифия-принт, 2021. P. 112–121.
Added: October 31, 2021
Автоматическое обнаружение и исправление деривационных ошибок в письменной речи на русском как иностранном
Vyrenkova A. S., Смирнов И. Ю., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2021 Т. 19 № 3 С. 57–68
Learner corpora serve as one of the most valuable sources of statistical data on learners' errors. For instance, data from foreign-language learners’ corpora can be used for the Second Language Acquisition research. However, corpora representativity strongly depends on the quality of its error markup, which is most frequently carried out manually and thus presents a ...
Added: September 24, 2021
Межъязыковая интерференция при выборе видо-временных форм английских глаголов в эссе русскоязычных студентов: корпусное исследование
Vinogradova O. I., Viklova A., В кн.: Межкультурное пространство: лингвистический и дидактический аспектыЧ. 2: Материалы секций «Межкультурная лингвистика», «Межкультурная транслатология» и студенческого научного форума.: Петрозаводск: Издательство ПетрГУ, 2021. С. 17–27.
Added: July 7, 2021
Hedges in Russian EAP writing: A corpus-based study of research papers in management
Smirnova E. A., Стринюк С. А., Journal of English as a Lingua Franca 2020 Vol. 9 No. 1 P. 81–101
The fact that English has become a lingua franca of academic communication has led to increased attention to teaching English for academic purposes (EAP) at the academia. Academic discourse markers, such as hedges, have been an important topic in academic writing research whose prime aim is helping non-Anglophone researchers to present their research findings in ...
Added: October 14, 2020
POS tagger evaluation for the automated text analysis and identification of learner error
Vinogradova O. I., Buzanov A., Генералова С. А. et al., , in: ПРОСТРАНСТВО НАУЧНЫХ ИНТЕРЕСОВ: ИНОСТРАННЫЕ ЯЗЫКИ И МЕЖКУЛЬТУРНАЯ КОММУНИКАЦИЯ - СОВРЕМЕННЫЕ ВЕКТОРЫ РАЗВИТИЯ И ПЕРСПЕКТИВЫВып. 3.: Буки Веди, 2019. Ch. 6 P. 44–49.
Working with learner corpora requires elaborate NLP techniques such as POS-annotation. In this article a team of computational linguists presents their experience of choosing a POS-tagger for precise and effortless annotation of .txt files with Python3. Russian Error-Annotated Learner English Corpus (REALEC) is the underlying corpora to which text features the POS-tagger has to respond. ...
Added: December 28, 2019
Automated assessment of learner text complexity
Lyashevskaya O., Irina Panteleeva, Olga Vinogradova, Assessing Writing 2021 No. 49 Article 100529
EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, and over the past two decades there appeared various tools for the provision of automated instant feedback. The presented paper offers an application that focuses on measuring text complexity, ...
Added: October 20, 2019
Inspector: The Tool For Automated Assessment Of Learner Text Complexity
Olga I. Vinogradova, Olga N. Lyashevskaya, Irina M. P., / NRU Higher School of Economics. Series WP BRP 55/LNG/2017. 2019. No. 79.
EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, but over the past two decades there appeared various tools ensuring the provision of automated instant feedback. The presented paper offers such a tool that focuses on measuring text ...
Added: October 10, 2019
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit