Classification Models for RST Discourse Parsing of Texts In Russian

Chistova E.; Shelmanov A.; Kobozeva M.; Pisarevskaya D.; Smirnov I.; S. Toldova

?

Classification Models for RST Discourse Parsing of Texts In Russian

P. 163–176.

Chistova E., Shelmanov A., Kobozeva M., Pisarevskaya D., Smirnov I., Toldova S.

The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for the Russian language. The corpus provides the discourse annotation in a widely adopted formalization – Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate the importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting.

Language: English

Text on another site

Keywords: RST treebank Feature Engineering discourse parsing

In book

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)

Issue 18. , M.: Russian State University for the Humanitie, 2019.

A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis

Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.

Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...

Added: September 23, 2025

Entropy-based text feature engineering approach for forecasting financial liquidity changes

Aleksei Riabykh, Suleimanov I., Nagovitcyn I. et al., EPJ Data Science 2025 Vol. 14 Article 17

Changes in individual and institutional financial behavior leading to shifts in liquidity flows often depend on events reflected in news. However, the task of establishing relationship between financial behavior and news remains challenging and understudied. We propose a news-based feature generation approach that allows accounting for news events in liquidity flow time-series predicting tasks, thereby ...

Added: August 5, 2025

Reducing False Positives in Bank Anti-fraud Systems Based on Rule Induction in Distributed Tree-based Models

Ivan Vorobyev, Krivitskaya A., Computers and Security 2022 Vol. 120 Article 102786

Fraud detection in bank payments transactions suffers from a high number of false positives. To deal with this problem, we introduce a rules generation framework for a fraud-detection system – an automatic rules generation using distributed tree-based ML (machine learning) algorithms such as Decision Tree, Random Forest and Gradient Boosting, where the components of expert ...

Added: June 8, 2022

Automated Metaphor Identification in Russian and Its Implications for Metaphor Studies

Badryzlova Y., Lyashevskaya O., Nikiforova A., , in: Distributed Computing and Artificial Intelligence, Volume 2: Special Sessions 18th International Conference (Lecture Notes in Networks and Systems 332)Vol. 2.: Springer, 2022. Ch. 8 P. 86–96.

Added: September 17, 2021

Proceedings of the First Workshop on Computational Approaches to Discourse

Association for Computational Linguistics, 2020.

Added: November 18, 2020

Proceedings of DISRPT 2019 - The Workshop on Discourse Relation Parsing and Treebanking. NAACL HLT 2019

Association for Computational Linguistics, 2019.

This book summarizes the main topics at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019). Co-located with NAACL 2019 in Minneapolis, the workshop’s aim was to bring together researchers working on corpus-based and computational approaches to discourse relations. In addition to an invited talk, eighteen papers outlined below were presented, four of which ...

Added: April 22, 2020

A Multi-Feature Classifier for Verbal Metaphor Identification in Russian Texts

Badryzlova Y., Panicheva P., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930.: Switzerland: Springer, 2018. Ch. 3 P. 23–34.

The paper presents a supervised machine learning experiment with multiple features for identification of sentences containing verbal metaphors in raw Russian text. We introduce the custom-created training dataset, describe the feature engineering techniques, and discuss the results. The following set of features is applied: distributional semantic features, lexical and morphosyntactic co-occurrence frequencies, flag words, quotation ...

Added: August 30, 2018

Rhetorical relation markers in Russian RST Treebank

Toldova S., Dina Pisarevskaya, Ananyeva M. et al., , in: Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms.: Stroudsburg, PA: Association for Computational Linguistics, 2017.

The paper deals with the pilot version of the first RST discourse treebank for Russian. The project started in 2016. At present, the tree bank consists of sixty news texts annotated for rhetorical relations according to RST scheme. However, this scheme was slightly modified in order to achieve higher inter-annotator agreement score. During the annotation pro cedure, we also registered the discourse con nectives ...

Added: November 6, 2017

Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

Stroudsburg, PA: Association for Computational Linguistics, 2017.

Added: November 6, 2017