?
Automatic dependency parsing of a learner English corpus REALEC
NRU HSE
,
2017.
Lyashevskaya O., Пантелеева И. М.
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing dependency parsing tools to a learner data, one has to take into account to what extent students' mistakes provoke errors in the parser output. The ungrammatical and stylistically inappropriate utterances may challenge parsers' algorithms trained on grammatically appropriate written texts. In our experiments, we compared the output of the dependency parser UDpipe (trained on UD-English 2.0) with the results of manual parsing, placing a particular focus on parses of ungrammatical English clauses. We show how mistakes made by students influence the work of the parser. Overall, UDpipe performed reasonably well (UAS 92.9, LAS 91.7). The following cases cause the errors in automatic annotation a) incorrect detection of a head, b) incorrect detection of the relation type, as well as c) both. We propose some solutions which could improve the automatic output and thus make the assessment of syntactic complexity more reliable.
Priority areas:
humanitarian
Language:
English
Keywords: учебный корпусанглийский язык как иностранныйlearner corpusuniversal dependenciesуниверсальные зависимостиdependency annotation of learner treebankevaluation of parser qualityL2 Englishсинтаксическая разметка корпусаразметка синтаксических зависимостейсинтаксическая разметка учебного корпусаоценка качества синтаксического парсинга
Publication based on the results of:
Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15
To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...
Added: May 16, 2026
Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66
Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...
Added: May 16, 2026
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as
in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Neal N. X., Weiqing L., Dacheng H. et al., Algorithms 2026 Vol. 19 No. 5 P. 1–22
In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...
Added: May 13, 2026
Fedorov D., Jezikoslovni Zapiski 2026 № 32(1) С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic languages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent grammatical phenomena such as argument ...
Added: May 13, 2026
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025).
The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110
Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...
Added: May 12, 2026
Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547
High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...
Added: May 11, 2026
Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.
В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...
Added: April 22, 2026
Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.
The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...
Added: April 19, 2026
Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.
The notion of “political accommodation” applied to the theory and practice of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...
Added: September 26, 2025
Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.
Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...
Added: September 19, 2025
Antipkina I., Ivanov A., Guzhelya D., / Series WP BRP "Basic research program". 2024.
This study presents a methodology for developing a new questionnaire format called explicit continuum scenario scales, in the example of a client focus questionnaire. Elements of the Rasch Guttman scenario scale methodology were used in its development. In three consequent studies, different aspects of the scale functioning were investigated. In Study 1, on the sample ...
Added: February 21, 2025
Aitzhanov S., / NRU HSE. Series WP BRP "Linguistics". 2024. No. 116.
This study focuses on examining the role of language as an attribute in the construction of ethnicity within the Korean community in Kazakhstan. The research examines how language functions as an attribute in the categorization and identification processes, and how it interacts with other ethnic attributes such as descent and appearance. Drawing on qualitative methods, ...
Added: December 10, 2024
Kisselev O., Klimov A., Mihail Kopotev, , in: Complexity, Accuracy and Fluency in Learner Corpus Research. Volume vi.: Amsterdam: John Benjamins Publishing Company, 2022. Ch. 3 P. 51–80.
The study reports on the results of a corpus-based evaluation of automatically extracted syntactic complexity measures as indices of Russian as a foreign language (FL) and Russian as a heritage language (HL) writing development. A list of 12 syntactic complexity measures was tested on a set of longitudinal, classroom-based data. The analyses demonstrated that the ...
Added: November 25, 2024
Микаелян А. Л., / NRU Higher School of Economics. Series WP BRP "Literary Studies". 2024. No. 28.
The article presents an attempt to examine Goethe's poem "Reineke Fox" in connection with the discussion between Goethe and Schiller about the nature of epic poetry and the principles of its renewal within the poetics of "Weimar Classicism". Goethe introduced into his interpretation of the medieval story of the Fox a number of innovations at various ...
Added: November 19, 2024
Remnev N., Obiedkov S., Rakhilina E. V. et al., / Series Computer Science "arxiv.org". 2023.
Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, ...
Added: October 30, 2024
Nikita Login, Jazykovedny Casopis 2023 Vol. 74 No. 1 P. 345–356
Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of ...
Added: September 16, 2024
Orekhov B., / Series Computer Science "arxiv.org". 2024.
In this paper, I apply linguistic methods of analysis to non-linguistic data, chess plays, metaphorically equating one with the other and seeking analogies. Chess game notations are also a kind of text, and one can consider the records of moves or positions of pieces as words and statements in a certain language. In this article ...
Added: August 8, 2024
Orekhov B., / Series Computer Science "arxiv.org". 2024.
Burrows' Delta was introduced in 2002 and has proven to be an effective tool for author attribution. Despite the fact that it was applied to different languages, they mostly belong to the same grammatical type and use the same graphic principle to convey speech in writing: a phonemic alphabet with word separation using spaces. The question ...
Added: August 8, 2024
Orekhov B., / Series Computer Science "arxiv.org". 2024.
Added: August 8, 2024
Tsigeman-Gorenko E., Likhanov M., Kalinnikova L. et al., / Series 00 "00". 2024.
Multiple studies show that reading in hard-to-read (dysfluent) fonts can enhance memory and comprehension of learnt material, but it is unclear if this effect extends to second language (L2) learning. This study investigated the impact of dysfluent fonts on L2 text memorisation and comprehension, accounting for learners’ individual differences (gender, L2 anxiety, L2 proficiency and L1 vocabulary size) ...
Added: June 10, 2024