?
Проблемы разметки корпуса текстов на русском языке в терминах теории риторических структур: из опыта создания ru-rstreebank
С. 120–126.
The work is devoted to different aspects of the Russian discourse treebank annotation. We discuss different issues of the procedure and different difficulties we came across in the process of adaptation of the RST theory to the Russian data of News texts.
In book
СПб.: Издательство Санкт-Петербургского университета, 2019.
Vinogradova O. I., Lyashevskaya O., , in: Text, Speech, and Dialogue. 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings Lecture Notes in Computer Science (LNAI), vol. 13502Vol. 13502.: Cham: Springer Publishing Company, 2022. P. 77–88.
REALEC, learner corpus released in the open access, had received 6,054 essays written in English by HSE undergraduate students in their English university-level examination by the year 2020. This paper reports on the data collection and manual annotation approaches for the texts of 2014–2019 and discusses the computer tools available for working with the corpus. ...
Added: October 5, 2022
Zaides K., Popova T., Bogdanova-Beglarian Natalia, , in: Proceedings of Computational Models in Language and Speech Workshop (CMLS 2018) co-located with the 15th TEL International Conference on Computational and Cognitive Linguistics (TEL-2018)Vol. 2303: Computational Models in Language and Speech 2018.: Kazan: CEUR Workshop Proceedings, 2018. P. 128–143.
Added: February 3, 2022
Zaides K., В кн.: Труды международной конференции «Корпусная лингвистика-2019».: Издательство Санкт-Петербургского государственного университета, 2019. С. 332–339.
Доклад посвящен процессу и результатам унификации разметки корпуса «Сбалансированная аннотированная текстотека». Данный корпус состоит из нескольких отдельных блоков, репрезентирующих устную речь представителей разных социальных и психологических групп. Для дальнейших лингвистических исследований, а также в целях сравнения данных, полученных на материале иных корпусов, необходимо было унифицировать систему разметки корпуса. На текущем этапе производилась замена основных знаков транскрипции, отмечающих особые явления, свойственные ...
Added: February 3, 2022
Chistova E., Shelmanov A., Pisarevskaya D. et al., , in: Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected PapersVol. 12602.: Springer, 2021. P. 105–119.
This work presents the first fully-fledged discourse parser for
Russian based on the Rhetorical Structure Theory of Mann and Thompson
(1988). For the segmentation, discourse tree construction, and discourse
relation classification we employ deep learning models. With the
help of multiple word embedding techniques, the new state of the art
for discourse segmentation of Russian texts is achieved. We found ...
Added: November 17, 2021
Соколова Е. Г., Toldova S., Компьютерная лингвистика и вычислительные онтологии 2020 № 4 С. 44–53
The work discusses the problem of discourse annotation and the consequences of the relations set simplification for the sake of higher interannotator agreement. One of the theoretical approaches to discourse structure representation is the Rhetoric Structure Theory by William Mann and Sandra Thompson [1]. There is a set of rhetoric relations between discourse units that ...
Added: November 17, 2021
Toldova S., Davydova T., Kobozeva M. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 747–761.
The paper presents a corpus study of the discourse features in the corpus of blogs. It is based on the data of Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The Ru-RSTreebank represents genres of news and popular science, scientific papers, and blogs texts. Blog subcorpus contains such topics as ...
Added: November 17, 2021
Переверзева С. И., Ермолаева Н. А., Zueva A. et al., Труды института русского языка им. В.В. Виноградова 2019 № 21 С. 319–325
The paper focuses on the manual gesture annotation in the Multimodal Russian Corpus (MURCO), which was started up by E.A. Grishina and is continued by the authors of this paper. The important idea of the annotation process is the attempt to provide “the uniformity and commonality of the markup” [Grishina 2010] to the maximum degree ...
Added: April 27, 2020
Toldova S., Davydova T., Kobozeva M. et al., , in: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)Issue 18.: M.: Russian State University for the Humanitie, 2019. P. 714–727.
The paper is devoted to a corpus study of the Contrast relation between discourse units in Russian. It is based on the data of the Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The research question is what cue phrases and lexical and grammatical patterns are used to express the ...
Added: April 22, 2020
Соколова Е. Г., Toldova S., В кн.: Труды международной конференции "Корпусная лингвистика - 2019".: СПб.: Издательство Санкт-Петербургского университета, 2019. С. 127–133.
The work is devoted to the detection of the Contrast vs. Comparison relations within the framework of the Rhetoric structure theory Mann-Thomson. The analysis of annotated data in terms of logical or pragmatic constraints is suggested. This analysis makes it possible to suggest some operational criteria for the relations under discussion. These criteria together with ...
Added: November 25, 2019
Bergelson M., Khudyakova M., , in: In Search of Basic Units of Spoken Language: A Corpus-Driven Approach.: John Benjamins Publishing Company, 2020. Ch. 8 P. 257–284.
This chapter deals with segmentation, definition of reference units and annotation of the first corpus of Russian narratives by individuals with brain damage – people with aphasia and right hemisphere damage – and neurologically healthy speakers. We show that such parameters as pause length and intonation contours cannot be used for segmentation of impaired speech. ...
Added: October 10, 2019
Shavrina T., Benko V., , in: Труды международной конференции "Корпусная лингвистика - 2019".: СПб.: Издательство Санкт-Петербургского университета, 2019. Ch. 13 P. 94–102.
This paper focuses on combining Russian open corpus resources into one single source. The article describes the motivation for gradual integration of existing text resources to create a more general project and analyzes in detail the main steps to merge the existing data to formats based on NoSketch Engine corpus standards and interface. ...
Added: September 9, 2019
Toldova S., Pisarevskaya D., Vasilyeva M. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.)Вып. 17(24).: М.: Издательский центр «Российский государственный гуманитарный университет», 2018. P. 747–761.
The purpose of the paper is to investigate cues signalling the relations between discourse units in Russian. Building a lexicon of discourse connectives is an indispensable subtask in many discourse parsing applications as well as an essential issue in theoretical researches of text coherence. In order to develop such a resource for Russian, we have ...
Added: September 1, 2018
Shuchalova Y., Lanin V., Информационные технологии 2018 Т. 24 № 8 С. 515–523
Описан этап проектирования портала для проведения корпусных исследований английского языка. Сформулированы требования к решению, показаны лингвистические подходы к решению поставленных задач. Приведен процесс моделирования системы и рассмотрены особенности реализации с учетом специфики предметной области. Для интеграции гетерогенных компонентов предложена сервисная архитектура. ...
Added: December 14, 2017
Vinogradova O. I., ПОЛИЛИНГВИАЛЬНОСТЬ И ТРАНСКУЛЬТУРНЫЕ ПРАКТИКИ 2018 Vol. 15 No. 2018/3 P. 372–380
The role of access to a learner corpus has proved to increase efficiency of L2 acquisition for learners as well as teaching efficiency for EFL instructors. This paper presents a computer tool for a learner corpus designed at the School of Linguistics of the Higher School of Economics for both categories of users. REALEC, Russian ...
Added: November 8, 2017