• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Preprints
  • Automatic dependency parsing of a learner English corpus REALEC
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.
May 15, 2026
‘What Matters Is Not What You Study, but Who You Study with
Katerina Koloskova began studying Arabic expecting to give it up after a year—now she cannot imagine her life without it. In an interview for the Young Scientists of HSE University project, she spoke about two translated books, an expedition to Socotra, and her love for Bethlehem.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Automatic dependency parsing of a learner English corpus REALEC

NRU HSE , 2017.
Lyashevskaya O., Пантелеева И. М.
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing dependency parsing tools to a learner data, one has to take into account to what extent students' mistakes provoke errors in the parser output. The ungrammatical and stylistically inappropriate utterances may challenge parsers' algorithms trained on grammatically appropriate written texts. In our experiments, we compared the output of the dependency parser UDpipe (trained on UD-English 2.0) with the results of manual parsing, placing a particular focus on parses of ungrammatical English clauses. We show how mistakes made by students influence the work of the parser. Overall, UDpipe performed reasonably well (UAS 92.9, LAS 91.7). The following cases cause the errors in automatic annotation a) incorrect detection of a head, b) incorrect detection of the relation type, as well as c) both. We propose some solutions which could improve the automatic output and thus make the assessment of syntactic complexity more reliable.
Research target: Computer Science Philology and Linguistics
Priority areas: humanitarian
Language: English
Full text
Keywords: учебный корпусанглийский язык как иностранныйlearner corpusuniversal dependenciesуниверсальные зависимостиdependency annotation of learner treebankevaluation of parser qualityL2 Englishсинтаксическая разметка корпусаразметка синтаксических зависимостейсинтаксическая разметка учебного корпусаоценка качества синтаксического парсинга
Publication based on the results of:
Лексикологические исследования на базе учебного корпуса REALEC (Learner corpus REALEC: Lexicological observations) (2016)
Similar publications
KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security
Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15
To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...
Added: May 16, 2026
DPN Verifier: Инструментарий для ускоренной верификации и исправления дефектных моделей процессов с данными
Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66
Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...
Added: May 16, 2026
КОГНИТИВНО-АССОЦИАТИВНОЕ ПОЛЕ ОНИМОВ САНКТ-ПЕТЕРБУРГА И ВЕНЫ
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks
Neal N. X., Weiqing L., Dacheng H. et al., Algorithms 2026 Vol. 19 No. 5 P. 1–22
In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...
Added: May 13, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 № 32(1) С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами
Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110
Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...
Added: May 12, 2026
Connected and Automated Vehicle Scenario Manager Graphical User Interface
Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547
High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...
Added: May 11, 2026
Школьный литературный канон эмиграции 1918–1939 гг.
Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.
В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...
Added: April 22, 2026
Современная российская мультипликация как инструмент воспитания традиционных духовно-нравственных ценностей
Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.
The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...
Added: April 19, 2026
Политическая аккомодация культурных различий в индустриально развитых обществах (Political Accommodation of Cultural Differences in Industrialized Societies)
Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.
The  notion  of  “political  accommodation” applied  to the  theory  and  practice  of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...
Added: September 26, 2025
Национальная мощь современных государств: сравнительный анализ. Аналитический доклад
Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.
Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...
Added: September 19, 2025
Explicit continuum scale format reduces the ceiling effect in self-report questionnaires comparing to Likert response format
Antipkina I., Ivanov A., Guzhelya D., / Series WP BRP "Basic research program". 2024.
This study presents a methodology for developing a new questionnaire format called explicit continuum scenario scales, in the example of a client focus questionnaire. Elements of the Rasch Guttman scenario scale methodology were used in its development. In three consequent studies, different aspects of the scale functioning were investigated. In Study 1, on the sample ...
Added: February 21, 2025
Language In The Construction Of Ethnicity Among Koreans In Kazakhstan: A Case Study Of The Korean Community In Karaganda
Aitzhanov S., / NRU HSE. Series WP BRP "Linguistics". 2024. No. 116.
This study focuses on examining the role of language as an attribute in the construction of ethnicity within the Korean community in Kazakhstan. The research examines how language functions as an attribute in the categorization and identification processes, and how it interacts with other ethnic attributes such as descent and appearance. Drawing on qualitative methods, ...
Added: December 10, 2024
Syntactic complexity measures as linguistic correlates of proficiency level in learner Russian
Kisselev O., Klimov A., Mihail Kopotev, , in: Complexity, Accuracy and Fluency in Learner Corpus Research. Volume vi.: Amsterdam: John Benjamins Publishing Company, 2022. Ch. 3 P. 51–80.
The study reports on the results of a corpus-based evaluation of automatically extracted syntactic complexity measures as indices of Russian as a foreign language (FL) and Russian as a heritage language (HL) writing development. A list of 12 syntactic complexity measures was tested on a set of longitudinal, classroom-based data. The analyses demonstrated that the ...
Added: November 25, 2024
CONNECTING ANCIENT AND MODERN: THE MEDIEVAL PLOT ABOUT THE FOX AND THE DISCUSSION BETWEEN GOETHE AND SCHILLER ABOUT GERMAN EPIC POETRY
Микаелян А. Л., / NRU Higher School of Economics. Series WP BRP "Literary Studies". 2024. No. 28.
The article presents an attempt to examine Goethe's poem "Reineke Fox" in connection with the discussion between Goethe and Schiller about the nature of epic poetry and the principles of its renewal within the poetics of "Weimar Classicism". Goethe introduced into his interpretation of the medieval story of the Fox a number of innovations at various ...
Added: November 19, 2024
A Language Model for Grammatical Error Correction in L2 Russian
Remnev N., Obiedkov S., Rakhilina E. V. et al., / Series Computer Science "arxiv.org". 2023.
Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, ...
Added: October 30, 2024
Distractor Generation for Lexical Questions Using Learner Corpus Data
Nikita Login, Jazykovedny Casopis 2023 Vol. 74 No. 1 P. 345–356
Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of ...
Added: September 16, 2024
You shall know a piece by the company it keeps. Chess plays as a data for word2vec models
Orekhov B., / Series Computer Science "arxiv.org". 2024.
In this paper, I apply linguistic methods of analysis to non-linguistic data, chess plays, metaphorically equating one with the other and seeking analogies. Chess game notations are also a kind of text, and one can consider the records of moves or positions of pieces as words and statements in a certain language. In this article ...
Added: August 8, 2024
How does Burrows' Delta work on medieval Chinese poetic texts?
Orekhov B., / Series Computer Science "arxiv.org". 2024.
Burrows' Delta was introduced in 2002 and has proven to be an effective tool for author attribution. Despite the fact that it was applied to different languages, they mostly belong to the same grammatical type and use the same graphic principle to convey speech in writing: a phonemic alphabet with word separation using spaces. The question ...
Added: August 8, 2024
Does Delta really confirm that Rowling and Galbraith are the same author?
Orekhov B., / Series Computer Science "arxiv.org". 2024.
Added: August 8, 2024
Difficulty overdose? Inconclusive effect of the disfluent font on reading in second language
Tsigeman-Gorenko E., Likhanov M., Kalinnikova L. et al., / Series 00 "00". 2024.
Multiple studies show that reading in hard-to-read (dysfluent) fonts can enhance  memory and comprehension of learnt material, but it is unclear if this effect extends to second language (L2) learning. This study investigated the impact of dysfluent fonts on L2 text memorisation and comprehension, accounting for learners’ individual differences (gender, L2 anxiety, L2 proficiency and L1 vocabulary size) ...
Added: June 10, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit