Book chapter
Извлечение шаблонов оценочных выражений из неразмеченного текста
In book
This paper provides an alternative method to extracting object-based sentiment in text messages, based on modified method previously proposed by Mingbo [8], in which we first parse the syntax, and then correlate the sentiment with the object of analysis (also referred to as entity by some, therefore, used in this article interchangeably). We show two approaches for the sentiment polarity classification: syntactic rule patterns and convolutional neural network (CNN). Even without domain specific vocabulary and sophisticated classification algorithms, rule-based approach demonstrates an average macro-F1 based rank among the participants, whereas domainspecific vocabularies show a slightly higher macro-F1 score, but still close to an average result. CNN approach uses syntax dependencies and linear word order to obtain more extensive information about object relations. Convolution patterns, designed in this approach, are very similar to rules, obtained with rule-based approach. In our proposed approach, the neural network was trained with different Word2Vec (WV) models; we compared their performance relative to each other. In this paper, we show that learning a domain-specific WV offers slight progress in performance. Resulting macro-F1 score show performance in the into top three of the overall results among the competitors, participating in 2016 SentiRuEval event. Originally, we have not submitted our results to this competition at the time it was held, but had a chance to compare them post-hoc. We also combine the CNN approach with the rule-based approach and discuss the obtained differences in results. All training sets, evaluation metrics and experiments are used according to SentiRuEval 2016.
The published article is the result of the analysis of a fragment of the ontology of texts created in the political discourse of the Nizhny Novgorod region. The frame structure (actants) of the texts on the political topic is presented in the asrticle. The object of research is the slot "evaluation" (the tone: a fragment of thesaurus of assessment tools is taken and interpreted from the point of meaning, expression and political correctness).
Prosodic markers of discourse tenor are analysed.
The paper discusses plural forms of Russian nouns (in particular, of the surnames) like vsjakie tam Ivanovy (‘various Ivanovs’, ‘all sorts of Ivanovs’), expressing negative opinion about the referents. The co-occurrence patterns of such Pl.Pej forms by the web-corpus data is revealed. Pl.Pej forms foremost fit together with universal quantifiers including ‘all’, ‘all of these’ etc., and can be easily integrate in quantificational expressions, e.g., combinations with numerals, collective nouns, and expressions that include number words like mnogo (‘many’). These elements are able to convey and support the meaning of multiplicity, non-uniqueness of the objects, denoted by forms of Pl.Pej. Among the usages of Pl.Pej the names of “oligarchs” and “right-wing, liberal politicians” predominate. The form mainly appears in heavily politicized texts. The studied form and co-occurrence patterns are a legacy of the Soviet socio-political discourse and originate from the language of Soviet newspapers. The Pl.Pej form is still a part of an aggressive leftist discourse, directed against a “group of the rich”. The addressant of such discourse is a representative of a “group of the poor, oppressed, socially humiliated”.