• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Data Conversion and Consistency of Monolingual Corpora: Russian UD Treebanks
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.
May 15, 2026
‘What Matters Is Not What You Study, but Who You Study with
Katerina Koloskova began studying Arabic expecting to give it up after a year—now she cannot imagine her life without it. In an interview for the Young Scientists of HSE University project, she spoke about two translated books, an expedition to Socotra, and her love for Bethlehem.
May 14, 2026
Resource Race and Green Transition: Three Unexpected Conclusions from Foresight Centres Research on Climate and Poverty
Beneath the surface of green energy—which most people associate with solar panels, electric vehicles, and reduced CO2 emissions—lies a complex web of geopolitical interests, international inequality, and resource constraints. Researchers from the Laboratory for Science and Technology Studies (LST) at the HSE ISSEK Foresight Centre have published a series of articles in leading international journals on hidden and overt conflicts surrounding critically important metals and minerals, as well as related processes in the energy sector.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Data Conversion and Consistency of Monolingual Corpora: Russian UD Treebanks

P. 52–65.
Дроганова К. А., Lyashevskaya O., Zeman D.
In press

In this paper we focus on syntactic annotation consistency within Universal Dependencies (UD) treebanks for Russian: UD_Russian-SynTagRus, UD_Russian-GSD, UD\_Russian-Taiga, and UD_Russian-PUD. We describe the four treebanks, their distinctive features and development. In order to test and improve consistency within the treebanks, we reconsidered the experiments by Martinez Alonso and Zeman; our parsing experiments were conducted using a state-of-the-art parser that took part in the CoNLL 2017 Shared Task. We analyze error classes in functional and content relations and discuss a method to separate the errors induced by annotation inconsistency and those caused by syntactic complexity and other factors.

Language: English
Full text
Keywords: NLP evaluationdependency parsinguniversal dependenciesannotation consistencyRussian treebanks
Publication based on the results of:
Материалы к частотному словарю русской поэзии (2018)

In book

Proceedings of TLT 2018 International Workshop on Treebanks and Linguistic Theories, 13-14 November 2018, Oslo, Norway. NEALT Proceedings Series
Linköping University Electronic Press, 2018.
Similar publications
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Tartu: University of Tartu Library, 2025.
The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...
Added: July 17, 2025
Disambiguation in context in the Russian National Corpus: 20 yeas later
Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 307–318.
An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...
Added: September 15, 2023
Building a Universal Dependencies Treebank for a Polysynthetic Language: the Case of Abaza
Koshevoy A., Panova A., Makarchuk I., , in: Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023).: Washington: Association for Computational Linguistics, 2023. P. 1–6.
In this paper, we discuss the challenges that we faced during the construction of a Universal Dependencies treebank for Abaza, a polysynthetic Northwest Caucasian language. We propose an alternative to the morpheme-level annotation of polysynthetic languages introduced in Park et al. (2021). Our approach aims at reducing the number of morphological features, yet providing all ...
Added: March 20, 2023
Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023)
Washington: Association for Computational Linguistics, 2023.
Added: March 20, 2023
Sculpting enhanced dependencies for Belarusian
Yana Shishkina, Lyashevskaya O., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 137–147.
Enhanced Universal Dependencies (EUD) are enhanced graphs expressed on top of basic dependency trees. EUD support repre- sentation of deeper syntactic relations in constructions such as coordi- nation, gapping, relative clauses, and argument sharing through control and raising. The paper presents experiments on the EUD parsing of the low-resource Belarusian language, for which no corpora ...
Added: January 4, 2022
An HMM-based PoS tagger for Old Church Slavonic
Lyashevskaya O., Afanasev I., Jazykovedny Casopis 2021 Vol. 72 No. 2 P. 556–567
We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as ...
Added: October 21, 2021
Length of East Caucasian subject indexes: a quantative research
Moroz G., , in: Дурхъаси хазна. Сборник статей к 60-летию Р. О. Муталова.: М.: Буки Веди, 2021. P. 258–282.
In this article I present a connection between frequency and length of person-number indexes via two independent researches: token frequency obtained from the Universal Dependencies’ treebanks and type frequency gathered within a typological study. After introducing the results of those two studies, I will present East Caucasian data. I show that the unusual history of ...
Added: May 23, 2021
Humans Keep It One Hundred: an Overview of AI Journey
Shavrina T., Emelyanov A., Fenogenova A. et al., , in: Proceedings of The 12th Language Resources and Evaluation ConferenceVol. 12.: European Language Resources Association (ELRA), 2020. P. 2276–2284.
Artificial General Intelligence (AGI) is showing growing performance in numerous applications - beating human performance in Chess and Go, using knowledge bases and text sources to answer questions (SQuAD) and even pass human examination (Aristo project). In this paper, we describe the results of AI Journey, a competition of AI-systems aimed to improve AI performance ...
Added: June 15, 2020
Proceedings of The 12th Language Resources and Evaluation Conference
European Language Resources Association (ELRA), 2020.
Welcome to the 12th edition of LREC . . . that should have been in Marseille, first time in France! Unfortunately not now, in May 2020. Now my welcome is completely virtual, to all of you authors of these Proceedings papers and to the colleagues who will look at these. Virtual but not less sincere. ...
Added: June 15, 2020
Adapting the Graph2Vec Approach to Dependency Trees for NLP Tasks
Durandin O., Malafeev A., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086.: Springer, 2020. P. 120–131.
In recent works on learning representations for graph structures, methods have been proposed both for the representation of nodes and edges for large graphs, and for representation of graphs as a whole. This paper considers the popular graph2vec approach, which shows quite good results for ordinary graphs. In the field of natural language processing, however, ...
Added: November 16, 2019
A cross-genre morphological tagging and lemmatization of the Russian poetry: distinctive test sets and evaluation
Starchenko A., Lyashevskaya O., , in: Digital Transformation and Global Society. Fourth International Conference, DTGS 2019, St. Petersburg, Russia, June 19–21, 2019, Revised Selected Papers.: Springer, 2019. P. 732–743.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic ...
Added: June 12, 2019
A Reusable Tagset for the Morphologically Rich Language in Change: a Case of Middle Russian
Lyashevskaya O., , in: Computational Linguistics and Intellectual TechnologiesIssue 18.: M.: Russian State University for the Humanitie, 2019. P. 422–434.
The paper discusses the standardization efforts to create a morphological standard for the Middle Russian corpus, which is part of the historical collection of the Russian National Corpus (RNC). To meet the needs of different categories of corpus researchers as well as NLP developers, we consider two styles of the morphological annotation (RNC schema and ...
Added: June 12, 2019
Applying statistical tagging to Russian poetry
Starchenko A., Kazakevich L., Lyashevskaya O., / NRU HSE. Series WP BRP "Linguistics". 2018. No. 76.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic ...
Added: December 12, 2018
Amateur Prose On The Web: Verb Construction As A Feature Of Genre Classification
Builova N., , in: Proceedings of Third Workshop "Computational linguistics and language science"Issue 4.: Manchester: EasyChair, 2019.
In our research we studied the dependency structure of the text genre love stories, detective stories, science fiction and fantasy). The novel characteristics (such syntactic attributes as verb constructions and construction of a specific cumulative threshold) which can be additional machine learning parameters were identified. We conducted experiment with novel features and showed that these ...
Added: December 11, 2018
REALEC learner treebank: annotation principles and evaluation of automatic parsing
Lyashevskaya O., Пантелеева И. М., , in: Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16).: Association for Computational Linguistics, 2017. P. 80–87.
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The original corpus is manually annotated for learners’ errors and gives information on the error span, error type, and the possible correction ...
Added: December 11, 2018
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16)
Association for Computational Linguistics, 2017.
The volume includes papers presented at the 16th International Workshop on Treebanks and Linguistic Theories (TLT), which brings together developers and users of linguistically annotated natural language corpora. As ‘treebanks’ we consider any pairing of natural language data (spoken or written) with annotations of linguistic structure at various levels of analysis, ranging from e.g. morpho-phonology ...
Added: December 11, 2018
Cross-tagset parsing evaluation for Russian
Дроганова К. А., Lyashevskaya O., , in: Digital Transformation and Global Society Third International Conference, DTGS 2018, St. Petersburg, Russia, May 30 –June 2, 2018, Revised Selected Papers, Part IIssue 858.: Cham: Springer, 2018. Ch. 31 P. 380–390.
Cross-tagset parsing is based on the substitution of one annotation layer for another while processing data within one language. As often as not, either the native tagger or the dependency parser used in (pre-)annotation of the Gold treebank is not available. The crosstagset approach allows one to annotate new texts using freely available tools or ...
Added: October 10, 2018
MorphoRuEval-2017: an Evaluation Track for the Automatic Morphological Analysis Methods for Russian
Sorokin A., Shavrina T., Lyashevskaya O. et al., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23).: M.: -, 2017. P. 297–313.
MorphoRuEval-2017 is an evaluation campaign designed to stimulate the development of the automatic morphological processing technologies for Russian, both for normative texts (news, fiction, nonfiction) and those of less formal nature (blogs and other social media). This article compares the methods participants used to solve the task of morphological analysis. It also discusses the problem ...
Added: October 9, 2018
Automatic morphological analysis on the material of Russian social media texts
Fenogenova A., Kazorin V., Karpov I. et al., , in: Proceedings of Third Workshop "Computational linguistics and language science"Issue 4.: Manchester: EasyChair, 2019. P. 11–17.
Automatic morphological analysis is one of the fundamental and significant tasks of NLP (Natural Language Processing). Due to special features of Internet texts, as they can be both normative texts (news, fiction, nonfiction) and less formal texts (such as blogs and texts from social networks), the morphological tagging has become non-trivial and an actual task. ...
Added: October 5, 2018
Использование универсальных зависимостей при грамматическом разборе многоязычного текста (на примере безличного предикатива)
Lyukina E. V., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2018 Т. 16 № 2 С. 19–33
The paper is dedicated to the initiative of universal dependences (UD), with aim to develop cross-linguistically consistent annotation scheme of grammatical analysis. The purpose of this initiative is in simplification of cross-language research, unification of interlanguage linguistic typology, building a foundation for the automated multilingual systems and the universal cross-language text parser. In the first part ...
Added: April 21, 2018
Text collections for evaluation of Russian morphological taggers
Lyashevskaya O., Bocharov V., Sorokin A. et al., Jazykovedny Casopis 2017 Vol. 68 No. 2 P. 258–267
The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single ...
Added: January 30, 2018
Automatic dependency parsing of a learner English corpus REALEC
Lyashevskaya O., Пантелеева И. М., / NRU HSE. Series WP BRP "Linguistics". 2017.
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing ...
Added: December 15, 2017
Universal Dependencies for Russian: A New Syntactic Dependencies Tagset
Lyashevskaya O., Droganova K., Zeman D. et al., / NRU HSE. Series WP BRP "Linguistics". 2016. No. 44.
This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google. ...
Added: December 14, 2016
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit