• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Проблемы разметки корпуса текстов на русском языке в терминах теории риторических структур: из опыта создания ru-rstreebank
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Проблемы разметки корпуса текстов на русском языке в терминах теории риторических структур: из опыта создания ru-rstreebank

С. 120–126.
Toldova S., Кобозева М. В., Тугутова А. А., Писаревская Д. Б.

The work is devoted to different aspects of the Russian discourse treebank annotation. We discuss different issues of the procedure and different difficulties we came across in the process of adaptation of the RST theory to the Russian data of News texts.  

Language: Russian
Keywords: corpus annotationаннотация корпусаrhetoric structureтеория риторических структур

In book

Труды международной конференции "Корпусная лингвистика - 2019"
СПб.: Издательство Санкт-Петербургского университета, 2019.
Similar publications
Review of Practices of Collecting and Annotating Texts in the Learner Corpus REALEC
Vinogradova O. I., Lyashevskaya O., , in: Text, Speech, and Dialogue. 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings Lecture Notes in Computer Science (LNAI), vol. 13502Vol. 13502.: Cham: Springer Publishing Company, 2022. P. 77–88.
REALEC, learner corpus released in the open access, had received 6,054 essays written in English by HSE undergraduate students in their English university-level examination by the year 2020. This paper reports on the data collection and manual annotation approaches for the texts of 2014–2019 and discusses the computer tools available for working with the corpus. ...
Added: October 5, 2022
Рragmatic Markers in the Corpus “Оne Day of Speech”: Approaches to the Annotation
Zaides K., Popova T., Bogdanova-Beglarian Natalia, , in: Proceedings of Computational Models in Language and Speech Workshop (CMLS 2018) co-located with the 15th TEL International Conference on Computational and Cognitive Linguistics (TEL-2018)Vol. 2303: Computational Models in Language and Speech 2018.: Kazan: CEUR Workshop Proceedings, 2018. P. 128–143.
Added: February 3, 2022
Об унификации разметки корпуса «Сбалансированная аннотированная текстотека»
Zaides K., В кн.: Труды международной конференции «Корпусная лингвистика-2019».: Издательство Санкт-Петербургского государственного университета, 2019. С. 332–339.
Доклад посвящен процессу и результатам унификации разметки корпуса «Сбалансированная аннотированная текстотека». Данный корпус состоит из нескольких отдельных блоков, репрезентирующих устную речь представителей разных социальных и психологических групп. Для дальнейших лингвистических исследований, а также в целях сравнения данных, полученных на материале иных корпусов, необходимо было унифицировать систему разметки корпуса. На текущем этапе производилась замена основных знаков транскрипции, отмечающих особые явления, свойственные ...
Added: February 3, 2022
RST Discourse Parser for Russian: An Experimental Study of Deep Learning Models
Chistova E., Shelmanov A., Pisarevskaya D. et al., , in: Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected PapersVol. 12602.: Springer, 2021. P. 105–119.
This work presents the first fully-fledged discourse parser for Russian based on the Rhetorical Structure Theory of Mann and Thompson (1988). For the segmentation, discourse tree construction, and discourse relation classification we employ deep learning models. With the help of multiple word embedding techniques, the new state of the art for discourse segmentation of Russian texts is achieved. We found ...
Added: November 17, 2021
К вопросу о формировании набора отношений для корпуса с дискурсивной разметкой текста
Соколова Е. Г., Toldova S., Компьютерная лингвистика и вычислительные онтологии 2020 № 4 С. 44–53
The work discusses the problem of discourse annotation and the consequences of the relations set simplification for the sake of higher interannotator agreement. One of the theoretical approaches to discourse structure representation is the Rhetoric Structure Theory by William Mann and Sandra Thompson [1]. There is a set of rhetoric relations between discourse units that ...
Added: November 17, 2021
Discourse features of blogs in subcorpus of Russian Ru-RSTreebank
Toldova S., Davydova T., Kobozeva M. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 747–761.
The paper presents a corpus study of the discourse features in the corpus of blogs. It is based on the data of Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The Ru-RSTreebank represents genres of news and popular science, scientific papers, and blogs texts. Blog subcorpus contains such topics as ...
Added: November 17, 2021
De profundis: проблемы глубокой разметки мультимедийного русского корпуса и пути решения
Переверзева С. И., Ермолаева Н. А., Zueva A. et al., Труды института русского языка им. В.В. Виноградова 2019 № 21 С. 319–325
The paper focuses on the manual gesture annotation in the Multimodal Russian Corpus (MURCO), which was started up by E.A. Grishina and is continued by the authors of this paper. The important idea of the annotation process is the attempt to provide “the uniformity and commonality of the markup” [Grishina 2010] to the maximum degree ...
Added: April 27, 2020
Contrast and comparison relations in RST framework
Toldova S., Davydova T., Kobozeva M. et al., , in: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)Issue 18.: M.: Russian State University for the Humanitie, 2019. P. 714–727.
The paper is devoted to a corpus study of the Contrast relation between discourse units in Russian. It is based on the data of the Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The research question is what cue phrases and lexical and grammatical patterns are used to express the ...
Added: April 22, 2020
Особые свойства риторических отношений "контраст" и "сравнение" на материале разметки в корпусе Ru-Rstreebank
Соколова Е. Г., Toldova S., В кн.: Труды международной конференции "Корпусная лингвистика - 2019".: СПб.: Издательство Санкт-Петербургского университета, 2019. С. 127–133.
The work is devoted to the detection of the Contrast vs. Comparison relations within the framework of the Rhetoric structure theory Mann-Thomson. The analysis of annotated data in terms of logical or pragmatic constraints is suggested. This analysis makes it possible to suggest some operational criteria for the relations under discussion. These criteria together with ...
Added: November 25, 2019
Narrative Discourse Segmentation in Clinical Linguistics
Bergelson M., Khudyakova M., , in: In Search of Basic Units of Spoken Language: A Corpus-Driven Approach.: John Benjamins Publishing Company, 2020. Ch. 8 P. 257–284.
This chapter deals with segmentation, definition of reference units and annotation of the first corpus of Russian narratives by individuals with brain damage – people with aphasia and right hemisphere damage – and neurologically healthy speakers.  We show that such parameters as pause length and intonation contours cannot be used for segmentation of impaired speech. ...
Added: October 10, 2019
Omnia Russica: Even Larger Russian Corpus
Shavrina T., Benko V., , in: Труды международной конференции "Корпусная лингвистика - 2019".: СПб.: Издательство Санкт-Петербургского университета, 2019. Ch. 13 P. 94–102.
This paper focuses on combining Russian open corpus resources into one single source. The article describes the motivation for gradual integration of existing text resources to create a more general project and analyzes in detail the main steps to merge the existing data to formats based on NoSketch Engine corpus standards and interface. ...
Added: September 9, 2019
The cues for rhetorical relations in Russian: "Cause-Effect" relation in Russian Rhetorical Structure Treebank
Toldova S., Pisarevskaya D., Vasilyeva M. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.)Вып. 17(24).: М.: Издательский центр «Российский государственный гуманитарный университет», 2018. P. 747–761.
The purpose of the paper is to investigate cues signalling the relations between discourse units in Russian. Building a lexicon of discourse connectives is an indispensable subtask in many discourse parsing applications as well as an essential issue in theoretical researches of text coherence. In order to develop such a resource for Russian, we have ...
Added: September 1, 2018
Исследовательский портал для анализа и оценки стиля научных публикаций
Shuchalova Y., Lanin V., Информационные технологии 2018 Т. 24 № 8 С. 515–523
Описан этап проектирования портала для проведения корпусных исследований английского языка. Сформулированы требования к решению, показаны лингвистические подходы к решению поставленных задач. Приведен процесс моделирования системы и рассмотрены особенности реализации с учетом специфики предметной области. Для интеграции гетерогенных компонентов предложена сервисная архитектура. ...
Added: December 14, 2017
АВТОМАТИЗИРОВАННАЯ ОЦЕНКА ЛЕКСИКОНА ОБУЧАЮЩИХСЯ ПРИ ПОМОЩИ УЧЕБНОГО КОРПУСА
Vinogradova O. I., ПОЛИЛИНГВИАЛЬНОСТЬ И ТРАНСКУЛЬТУРНЫЕ ПРАКТИКИ 2018 Vol. 15 No. 2018/3 P. 372–380
The role of access to a learner corpus has proved to increase efficiency of L2 acquisition for learners as well as teaching efficiency for EFL instructors. This paper presents a computer tool for a learner corpus designed at the School of Linguistics of the Higher School of Economics for both categories of users. REALEC, Russian ...
Added: November 8, 2017
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit