• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Distractor Generation for Lexical Questions Using Learner Corpus Data
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 30, 2026
HSE Researchers Compile Scientific Database for Studying Childrens Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
April 30, 2026
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Distractor Generation for Lexical Questions Using Learner Corpus Data

Jazykovedny Casopis. 2023. Vol. 74. No. 1. P. 345–356.
Nikita Login

Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of this paper, is to generate distractors for the retrieved questions. The presented approach (called DisSelector) is based on supervised learning on specially annotated learner corpus data. For each sentence a list of distractor candidates was retrieved. Then, each candidate was manually labelled as a plausible or implausible distractor. The derived set of examples was additionally filtered by a set of lexical and grammatical rules and then split into training and testing subsets in 4:1 ratio. Several classification models, including classical machine learning algorithms and gradient boosting implementations, were trained on the data. Word and sentence vectors from language models together with corpus word frequencies were used as input features for the classifiers. The highest F1-score (0.72) was attained by a XGBoost model. Various configurations of DisSelector showed improvements over the unsupervised baseline in both automatic and expert evaluation. DisSelector was integrated into an opensource language testing platform LangExBank as a microservice with a REST API.

Research target: Philology and Linguistics
Language: English
Full text
DOI
Text on another site
Keywords: учебный корпусlearner corporadistractor generationautomated question generationгенерация неправильных вариантов ответаавтоматизированная генерация вопросов
Publication based on the results of:
Second-language acquisition modelling within different frameworks of existing theories on the basis of learner corpora platforms for experiments and computer tools (2023)
Similar publications
XI Международная конференция молодых исследователей "Текстология и историко-литературный процесс": сборник статей
М.: Издательские решения, 2025.
В настоящий сборник вошли работы участников XI Международной конференции «Текстология и историко-литературный процесс» на филологическом факультете МГУ имени М. В. Ломоносова. Статьи, представленные в книге, посвящены вопросам текстологии и истории литературы. ...
Added: April 30, 2026
«Подснежник. Журнал для детского и юношеского возрастов» (Санкт-Петербург, 1858 –1862). Роспись содержания
Фатеева М. С., Литературный факт 2022 Т. 26 № 4 С. 248–277
Работа представляет собой роспись содержания журнала для детского и юношеского чтения «Подснежник», выходившего в Санкт-Петербурге в 1858–1862 гг. под редакцией В.Н. Майкова. В издании журнала принимали участие многие хорошо известные литераторы середины XIX в. (И.А. Гончаров, Д.В. Григорович, А.Н. Майков и др.). Во вступительной статье кратко обрисована история издания «Подснежника», охарактеризованы появлявшиеся в нем материалы ...
Added: April 30, 2026
Ирония в пьесе Ватсараджи «Киратарджуния» (XII в.)
Минаева М. Д., Вестник Института востоковедения РАН 2025 № 6 С. 143–155
This article examines the rhetorical device of “irony” in the Sanskrit poetic tradition, using examples from the medieval playwright Vatsarāja’s Kirātārjunīya (“The Kirāta and Arjuna,” 12th century). This play belongs to the rare vyāyoga genre, which is characterized by the depiction of a great battle between two renowned heroes accompanied by a verbal duel filled with ...
Added: April 30, 2026
Новостной медианарратив в соцсети «ВКонтакте»: дискурсивные особенности
Александрова И. Б., Кара-Мурза Е. С., Славкин В. В., Вестник Московского университета. Серия 10: Журналистика 2021 № 3 С. 74–102
Особую роль на современном этапе развития журналистики играют социальные сети, которые используются как площадка для размещения материалов СМИ. Свои страницы в разных соцсетях имеют многие ведущие российские СМИ. Поскольку пользователи соцсетей прежде всего обращаютсяк новостным материалам, объектом данного исследования стал новостной дискурс, представленный в наиболее популярной в России соцсети – «ВКонтакте». На примере размещенных здесь ...
Added: April 29, 2026
Прецедентные феномены как когнитивная база для метафор (на примере современного англоязычного дискурса беременных)
Chermoshentseva K., Вестник Томского государственного университета 2025 № 520 С. 57–68
Precedent as a linguistic phenomenon is a complex phenomenon that can create a comprehensive image with minimal linguistic costs due to an extensive cognitive base. The mechanism involved in the use of precedent phenomena is similar to the process of metaphorization, since two cognitive spheres are involved, one of which is a resource base for describing ...
Added: April 29, 2026
Семантика конструкций со значением всеобщности в малокарачкинском говоре чувашского языка в типологической перспективе
Russkih A., Урало-алтайские исследования 2026 Т. 60 № 1 С. 42–65
This paper examines constructions that express universal quantification in the Poshkart variety of the Chuvash language. This variety has five quantifier words for universals: por, mënbor, pëdëm, kaʐni, and veɕ. These words may combine with an additive or emphatic clitic, a possessive marker, or the instrumental case. The paper describes the semantic distribution of universal ...
Added: April 29, 2026
Паузы хезитации в педагогическом дискурсе: перцептивный аспект
Zubov V., Осадчая М. А., Риехакайнен Е. И., Вестник Санкт-Петербургского университета. Язык и литература 2026 Т. 23 № 1 С. 99–119
The article is a part of a comprehensive study of the linguistic characteristics of teacher’s speech, which contribute to the success of the pedagogical discourse. Based on a survey of secondary school students and an analysis of previous research in the field, non-syntactic pauses of hesitation were chosen as the object of the study, i. ...
Added: April 29, 2026
Метафора как инструмент организации воспоминаний в дискурсивном мнемическом нарративе
Chermoshentseva K., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 264–272
The article explores the use of metaphor as a tool for structuring and manifesting memories and events in speech. The relevance of this study stems from the rapid pace of research into memory and its manifestation at the narrative level through linguistic means. Its novelty lies in demonstrating the ways to utilize the two-sided nature ...
Added: April 28, 2026
Детерминологизация в языке СМИ и сопутствующие семантические процессы
Kolchina O., Romanova T. V., Журнал Сибирского федерального университета. Серия: Гуманитарные науки 2026 № 19(3) С. 605–617
This article examines the semantic and pragmatic transformations of cognitive linguistic terms influenced by mass media. When introduced into non-specialized discourse, a term can lose its connection with its scientific concept and develop new, common meanings through processes of narrowing, broadening, differentiation, attraction, metaphorical and metonymic transfer, and more. This results in an incorrect representation ...
Added: April 27, 2026
ЗООЛОГИЧЕСКАЯ МЕТАФОРА КАК СРЕДСТВО РЕПРЕЗЕНТАЦИИ ПСИХОЭМОЦИ-ОНАЛЬНЫХ СОСТОЯНИЙ (НА МАТЕРИАЛЕ ПРОИЗВЕДЕНИЙ Ч. ПАЛАНИКА)
Tsygunova M., Этнопсихолингвистика 2026 № 1(24) С. 81–98
This paper examines the use of animalistic metaphors in Chuck Palahniuk’s prose to describe psychoemotional states. The relevance of the study lies in uncovering the mechanisms by which complex emotional experiences are conveyed through animal metaphors. The aim of the research is to report on the way the author conceptualizes the interconnection between human behaviour ...
Added: April 27, 2026
Способы введения специальной терминологии в научно-популярный нарратив медицинской тематики (на материале произведений Г. Марша)
Nagornaya A., Пинчукова А. Е., Мир науки. Социология, филология, культурология 2025 Т. 16 № 4 С. 1–13
The article examines the use of specialized terminology in popular science texts on medical topics. It focuses on the phenomenon of popularizing medical knowledge in contemporary English-language culture, identifies the reasons for the widespread demand for this type of literature (the need for reliable information, increased personalization of medical texts, and the diversity of genres ...
Added: April 27, 2026
Японский язык в вузе: актуальные проблемы преподавания. Сборник научных работ. Материалы Второго международного форума «Языки и культуры Восточной Азии в образовательном пространстве» (МГПУ, 23–26 апреля 2025). Выпуск 30
МГПУ, Языки народов мира, 2025.
30-й выпуск сборника «Японский язык в вузе» продолжает серию публикаций, посвященных различным вопросам теории и практики преподавания японского языка, лингвистики, культурологии. Данный выпуск содержит материалы Второго международного форума «Языки и культуры Восточной Азии в образовательном пространстве», проходившего в МГПУ, 23–26 апреля 2025 года. ...
Added: April 26, 2026
What we do in the shadows of the pear tree: Tense switching in Shughni Pear Stories
Melenchenko M., Indo-Iranian Languages 2026 Vol. 2 No. 1 P. 74–99
This article presents the results of a study on the narrative functions of verb tenses in Shughni. Shughni is an Eastern Iranian language with a compact TAME system, which has tensed evidentials (with Preterite being the direct past and Perfect, the indirect past) and lacks grammaticalized aspect. The current study analyzes five narrations of the ...
Added: April 25, 2026
The Family of the Palatinus Latinus 846 in the Manuscript Tradition of the Passio Susannae (BHL 7937)
Shumilin M., Revue d'Histoire des Textes 2025 Vol. 20 P. 251–280
In the article, an attempt is made to apply stemmatic procedures to the manuscript tradition of the Latin Passio Susannae (BHL 7937, dated to the fifth or sixth century ad), in particular to a family which, it is argued, includes mss Città del Vaticano, BAV, Pal. lat. 846; Karlsruhe, BLB, Aug. perg. 32; Zürich, Zentralbibliothek, Rh. 81 and Darmstadt, ULB, 383 together with the famous lost codex Fuldensis. The author concludes that the Pal. ...
Added: April 20, 2026
Арктика в российских медиа: проблематика и тематические доминанты
Жигунов А. Ю., Terra Linguistica 2020 Т. 11 № 3 С. 97–107
The Arctic and its development issues become more and more important in the information agenda of the Russian and world media over the past years. The reasons for this increased attention to the region are the activities of main decision makers: authorities, army, business, nature defenders, international organizations, etc., aimed at expanding regional influence, improving ...
Added: April 19, 2026
Методика аннотирования корпуса устной речи учителей
Риехакайнен Е. И., Браташ В. С., Zubov V. et al., Вопросы образования 2024 № 2 С. 251–285
The article describes the principles of creating a corpus of teachers’ speech, which enables to apply an ethnographic approach to study teaching practices. Through the analysis of a large dataset of real classroom recordings, this corpus aims to identify linguistic, psychological, and sociological factors contributing to the improvement of teaching effectiveness. The corpus includes audio ...
Added: April 19, 2026
L1 Influence on the Use of the English Present Perfect: A Corpus Analysis of Russian and Spanish Learners’ Essays
Perez-Guerra J., Smirnova E. A., Journal of Language and Education 2024 Vol. 10 No. 1 P. 101–114
Mastering verbal tenses, especially those expressing aspect, in a second language presents a challenge as learners frequently link the semantic nuances of verbal forms in their second language (L2) to the characteristics of the verbal systems in their native languages (L1). This study explores the impact of L1 on the usage of the English Present ...
Added: March 3, 2024
Обработка слов с частотными орфографическими ошибками (исследование на базе учебного корпуса английского языка)
Klimova M., Viklova A., Overnikova D., Вестник Санкт-Петербургского университета. Язык и литература 2023 Т. 20 № 4 С. 824–837
The article presents an experimental study of the influence of the frequency of spelling errors in a word on its representation in mental lexicon. The hypothesis that frequently misspelled words cause difficulties in reading even if they are written correctly has been proved for native speakers of Russian and English. This paper aims to check ...
Added: January 26, 2024
Устный учебный корпус РКИ: новый источник данных для лингвистических и методических исследований
Vlasova E., Бец Ю. В., Северина Е. М., В кн.: «Русская грамматика в диалоге научных школ, направлений, методов».: Владивосток: Издательство ДВФУ, 2022.
В статье анализируются нетривиальные фонетические и грамматические явления устной речи иностранцев, изучающих русский язык. Показано, что устный учебный корпус позволяет получить систематическое представление о компенсаторных механизмах речепорождения, проверять и формулировать гипотезы. ...
Added: November 8, 2023
Аннотирование учебного корпуса в аспекте его использования для исследовательских задач
Klimova M., Viklova A., Overnikova D., В кн.: Современная лингвистика: от теории к практике. III Казанский международный лингвистический саммит (Казань, 14–19 ноября 2022 г.): Труды и материалы, в трёх томах, том 1.: Каз.: Издательство Казанского университета, 2022. С. 46–50.
В данной статье рассматривается классификация ошибок, используемая в учебном корпусе REALEC, в аспекте ее соответствия требованиям и приспособленности для исследовательских задач. ...
Added: January 17, 2023
Review of Practices of Collecting and Annotating Texts in the Learner Corpus REALEC
Vinogradova O. I., Lyashevskaya O., , in: Text, Speech, and Dialogue. 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings Lecture Notes in Computer Science (LNAI), vol. 13502Vol. 13502.: Cham: Springer Publishing Company, 2022. P. 77–88.
REALEC, learner corpus released in the open access, had received 6,054 essays written in English by HSE undergraduate students in their English university-level examination by the year 2020. This paper reports on the data collection and manual annotation approaches for the texts of 2014–2019 and discusses the computer tools available for working with the corpus. ...
Added: October 5, 2022
Word-formation complexity: a learner corpus-based study
Lyashevskaya O., Pyzhak J.V., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471–492
This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...
Added: October 5, 2022
Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons
M. A. Klimova, V. K. Smilga, D. A. Overnikova, , in: Труды международной конференции «Корпусная лингвистика–2021».: Скифия-принт, 2021. P. 112–121.
Added: October 31, 2021
Автоматическое обнаружение и исправление деривационных ошибок в письменной речи на русском как иностранном
Vyrenkova A. S., Смирнов И. Ю., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2021 Т. 19 № 3 С. 57–68
Learner corpora serve as one of the most valuable sources of statistical data on learners' errors. For instance, data from foreign-language learners’ corpora can be used for the Second Language Acquisition research. However, corpora representativity strongly depends on the quality of its error markup, which is most frequently carried out manually and thus presents a ...
Added: September 24, 2021
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit