Towards the Data-driven System for Rhetorical Parsing of Russian Texts.

?

Towards the Data-driven System for Rhetorical Parsing of Russian Texts.

Toldova S., Chistova E., Kobozeva M., Shelmanov A., Smirnov I., Pisarevskaya D.

Results of the first experimental evaluation of machine learning models trained on RuRSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, the ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources

Language: English

DOI

Text on another site

In book

Proceedings of DISRPT 2019 - The Workshop on Discourse Relation Parsing and Treebanking. NAACL HLT 2019

Association for Computational Linguistics, 2019.

On the generalization ability of data-driven models in the problem of total cloud cover retrieval

Krinitskiy M., Alexandrova M., Verezemskaya P. et al., Remote Sensing 2021 Vol. 13 No. 2 Article 326

Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the ...

Added: September 24, 2021

Proceedings of the First Workshop on Computational Approaches to Discourse

Association for Computational Linguistics, 2020.

Added: November 18, 2020

Proceedings of DISRPT 2019 - The Workshop on Discourse Relation Parsing and Treebanking. NAACL HLT 2019

Association for Computational Linguistics, 2019.

This book summarizes the main topics at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019). Co-located with NAACL 2019 in Minneapolis, the workshop’s aim was to bring together researchers working on corpus-based and computational approaches to discourse relations. In addition to an invited talk, eighteen papers outlined below were presented, four of which ...

Added: April 22, 2020

Classification Models for RST Discourse Parsing of Texts In Russian

Chistova E., Shelmanov A., Kobozeva M. et al., , in: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)Issue 18.: M.: Russian State University for the Humanitie, 2019. P. 163–176.

The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of ...

Added: October 16, 2019

Новые данные, новая статистика: от кризиса воспроизводимости к новым требованиям к анализу и представлению данных в социальных науках

Deviatko I. F., Социологические исследования 2018 № 12 С. 30–38

The article analyzes main causes and consequences of the interdisciplinary crisis of the reproducibility and reliability of the results of scientific research that has unfolded in the social sciences in parallel with the «data revolution». This crisis is expressed not only in the growing concern of scientists about the reliability of research results and the ...

Added: January 17, 2019

Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16)

Association for Computational Linguistics, 2017.

The volume includes papers presented at the 16th International Workshop on Treebanks and Linguistic Theories (TLT), which brings together developers and users of linguistically annotated natural language corpora. As ‘treebanks’ we consider any pairing of natural language data (spoken or written) with annotations of linguistic structure at various levels of analysis, ranging from e.g. morpho-phonology ...

Added: December 11, 2018

Инструменты корпусного анализа в обучении иностранному языку

Gorina O. G., Вестник Томского государственного университета 2018 Т. 22 № 435 С. 187–194

As was initially suggested by data-driven teaching pioneers not only the researcher, but also the learner should be given the chance of studying language through corpus or get access to authentic linguistic data. Working on that assumption,the article elaborates on the potential of corpus analysis for the purpose of L2 teaching. Firstly, a succession of ...

Added: January 21, 2018