The paper presents a supervised machine learning experiment with multiple features for identification of sentences containing verbal metaphors in raw Russian text. We introduce the custom-created training dataset, describe the feature engineering techniques, and discuss the results. The following set of features is applied: distributional semantic features, lexical and morphosyntactic co-occurrence frequencies, flag words, quotation marks, and sentence length. We combine these features into models of varying complexity; the results of the experiment demonstrate that fairly simple models based on lexical, morphosyntactic and semantic features are able to produce competitive results.
The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for the Russian language. The corpus provides the discourse annotation in a widely adopted formalization – Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate the importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting.
The paper gives an account of a system for automated identification of linguistic metaphor in Russian text. The design of the system is based on the five features: semantic heterogeneity, lexical and morphosyntactic metaphor association, concreteness-abstractness, and topic vectors. Since each of these features is motivated by a specific set of assumptions about the linguistic and the cognitive nature of metaphor, we undertake feature analysis, aiming to reveal possible linguistic and psycholinguistic cues of metaphoricity. Namely, we extract tentative lexical, morphosyntactic, and topical predictors of metaphoricity; we also test the hypotheses of correlation between metaphoricity, on the one hand, and concreteness as well semantic and topical heterogeneity, on the other.