Text integrity assessment: Sentiment profile vs rhetoric structure

B. Galitsky; D. Ilvovsky

doi:10.1007/978-3-319-18117-2_10

Publications

?

Text integrity assessment: Sentiment profile vs rhetoric structure

P. 126–139.

Galitsky B., Ilvovsky D., Kuznetsov S.

We formulate the problem of text integrity assessment as learning thediscourse structure of text given the dataset of texts with high integrity and lowintegrity. We use two approaches to formalizing the discourse structures, sentimentprofile and rhetoric structures, relying on sentence-level sentiment classifierand rhetoric structure parsers respectively. To learn discourse structures, weuse the graph-based nearest neighbor approach which allows for explicit featureengineering, and also SVM tree kernel–based learning. Both learning approachesoperate on the graphs (parse thickets) which are sets of parse trees with nodeswith either additional labels for sentiments, or additional arcs for rhetoric relationsbetween different sentences. Evaluation in the domain of valid vs invalidcustomer complains (those with argumentation flow, non-cohesive, indicating abad mood of a complainant) shows the stronger contribution of rhetoric structureinformation in comparison with the sentiment profile information. Both abovelearning approaches demonstrated that discourse structure as obtained by RSTparser is sufficient to conduct the text integrity assessment. At the same time,sentiment profile-based approach shows much weaker results and also does not complement strongly the rhetoric structure ones.

Language: English

Full text

DOI

Keywords: sentiment analysis parse thicket rhetoric structure

Publication based on the results of:

Data mining based on lattices of closed descriptions and applied ontologies (2015)

In book

Computational Linguistics and Intelligent Text Processing. 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II.

Vol. 9042. , Berlin: Springer, 2015.

Changes in the UK leading media's portrayal of China during the Covid-19 pandemic and the special military operation

Balakina Y. V., Yin Z., Известия Саратовского университета. Новая серия. Серия: Филология. Журналистика 2025 Vol. 25 No. 2 P. 229–236

The aim of the present study is to trace changes in the construction of the image of China in the British media during two crisis periods: the COVID-19 pandemic and the Russian military operation. Each period encompasses a panic (escalation) phase and a recovery (stagnation) phase. Using data from the Factiva database, 70,356 articles published ...

Added: January 20, 2026

Сопоставительный анализ уникальных впечатлений американских туристов о мемориале Линкольну в доковидный и постковидный периоды

Smolyanina E., Morozova I., Харитонова Н. В., Географический вестник 2025 № 4 (75) С. 162–177

The unique tourist experience is one of the main components of tourism activity. However, it is not studied in Russian and Western science. This determined the purpose of the study, that is to identify the characteristics of unique American tourists’ experiences in online reviews about the Lincoln Memorial on the travel site TripAdvisor in the ...

Added: January 7, 2026

Императивный интернет-комментарий как особый жанр конфликтной интернет-коммуникации

Shulginov V., Жанры речи 2025 Т. 20 № 3(47) С. 327–336

The article examines the imperative internet comment as a special genre of conflict internet discourse. The research was based on the study of two communities of the social network “VKontakte”, differing in the structure of social connections: vertical type (official community “VKontakte with authors”) and horizontal type (“Showbiz stars news”). Using automatic methods of data collection and analysis, ...

Added: October 12, 2025

Representation of the Post-Soviet Countries in the Global Online Information Space in 2020–2021: Frequency of Mention, Media Dynamics, Mood Characteristics

Sharikov A., , in: Internet in the Post-Soviet Area: Technological, Economic and Political Aspects.: Cham: Springer, 2023. Ch. 1 P. 7–46.

The chapter contains results of a study of the representation of 19 post-Soviet countries and territories on the global Internet in 2020–2021. It was carried out with the help of the FACTIVA monitoring database (information texts, over 23000 online resources from more than 100 countries in 26 languages). It turned out that in 2020–2021 only ...

Added: February 6, 2025

Представленность России в британских онлайн-источниках в 2022 г.

Sharikov A., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2024 Т. 29 № 3 С. 534–550

The article examines peculiarities of representation of russia in British online sources in 2022, when russia launched a special military operation in ukraine. the author used a statistical approach to analysis based on the factiva monitoring system, the database of which contains about 4.5 million texts published on 416 British online resources from January 1 ...

Added: February 5, 2025

О соотношении сообщений позитивной и негативной тональности на русскоязычных информационных онлайн-ресурсах

Sharikov A., Потапова В. В., Вестник Академии медиаиндустрии 2023 Т. 34 № 2 С. 48–64

The article presents the results of a study conducted at the Higher School of Economics (HSE) on the corpus of texts of the monitoring system Factiva, published in 2020. The purpose of the study is to identify the quantitative relationship between positive and negative tone publications on Russian-language online resources in comparison with publications of ...

Added: February 5, 2025

Fear and Loathing in Russian Literature: A Case of Emotion Annotation of Short Stories of the 20th Century

Anna Moskvina, Margarita Kirina, , in: 27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534Vol. 2534.: Springer, 2025. P. 113–129.

The paper presents an investigation of the emotional aspect of the Russian short story of the 20th century. Our study is two-fold: firstly, we delve into emotional representation at the lexical level, building upon previous work on utilizing vector models to quantify emotional content. In this study, we introduce an annotated corpus where words are ...

Added: November 29, 2024

Партийно-политическая динамика в Норвегии как фактор российско-норвежских отношений

Chistikov M., Полис. Политические исследования 2024 № 4 С. 38–55

The relations between Russia and Norway are of a contradictory nature, containing both positive and negative elements. In the scientific literature, the systemic factors affecting Russian-Norwegian relations are well studied, while the internal political reasons for the transformation of bilateral relations are insufficiently explored. In 2013, i.e. before the international political crisis of 2014, a ...

Added: August 2, 2024

Identifying American tourists’ unique experiences from the Lincoln Memorial

Smolyanina E., Morozova I., Kharitonova N., Географический вестник 2024 No. 2(69) P. 150–164

Detailed experiences of travelers are presented in online tourist reviews that affect the way other tourists perceive and plan their trips. Such reviews are sources of information in the form of open writing that allows reliable sharing of experience about tourist attractions. Previous studies have made use of tourist reviews to obtain lists of the ...

Added: July 18, 2024

Perception of AI-generated art: text analysis of online discussions

Bosonogov S., Suvorova A., Journal of Mathematical Sciences 2023 Vol. 529 P. 6–23

In this work we analyze comments on three subreddits related to AI-generated art to understand how people perceive the ability of AI to create art and the topics and moods of discussions in the context of widespread usage of pre-trained models. We used computational text analysis techniques such as LDA topic modeling and sentiment analysis ...

Added: February 4, 2024

Исследовательский потенциал корпуса советских песен: эмоциональная тональность и география песенных текстов через призму компьютерных технологий

Kolmogorova A., Зарембо В. С., Ткачева Е. С. et al., В кн.: Лингвистическая семантика в пространственном измерении: Словарь. Дискурс. Корпус.: Екатеринбург: Кабинетный ученый, 2024. Гл. 10 С. 423–445.

The purpose of this study is to describe the characteristics of the text of a popular Soviet song as a linguo-ideological phenomenon. The corpus of Soviet songs collected by the research group is used as material. The focus of this publication is on two characteristics: changes in the emotional tonality of popular songs released on ...

Added: December 10, 2023

О прошлом, но в разное время: компьютерный анализ текстов учебников по истории СССР/России для шести поколений студентов

Kolmogorova A., Колмогорова П. А., Куликова Е. Р., Вестник Томского государственного университета. Филология 2024 № 89 С. 73–103

In this article, we focus on the analysis of the texts of three history textbooks for university students published at different times: in 1946, in 1983 and in 2006. As a material, we use texts devoted in each of the textbooks to seven historical topics since the beginnings of Kiev principality till the Reforms of ...

Added: December 10, 2023

Sentiment Analysis of Literary Texts: A Study of Theme and Readers' Preferences in Russian Short Stories from 1900-1930s

Tatiana Sherstinova, Anna Moskvina, Margarita Kirina et al., , in: Literature, Language and Computing: Russian Contribution from the LiLaC-2023.: Springer, 2025. P. 23–35.

Added: December 9, 2023

Where Is Happily Ever After? A Study of Emotions and Locations in Russian Short Stories of 1900–1930

Moskvina A., Kirina M., , in: Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023).: Springer, 2023. P. 123–135.

The paper tackles the problem of the automatic detection of emotions in literary texts using distributional semantics techniques. The experiment was carried out on the material of Russian short stories from the 1900-1930s. We investigated the emotional lexis distribution across different locations in narratives. At first, we calculated the semantic association score between each word ...

Added: December 9, 2023

Несчастливы по-своему: как измерить тональность литературного текста?

Sherstinova T., Moskvina A., Kirina M. et al., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 232–240.

In the experimental study, the results of three different approaches to the evaluation of the tonality of literary texts are compared: dictionary-based, machine learning, and distributional semantics. The material for analysis was a selection of 210 stories by Russian writers from the first three decades of the 20th century. The research showed that the correlation ...

Added: December 9, 2023

От любви до ненависти: распределение эмоциональной лексики в русском рассказе начала XX века

Moskvina A., Kirina M., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 156–166.

The paper presents the results of experiments investigating the distribution of emotional vocabulary in Russian short stories of the beginning of the 20th century. The emotionality of words and texts is determined automatically using the methods of distributive semantics, which does not require the use of dictionaries or preliminary data annotation. The results include data ...

Added: December 9, 2023

Think about what you’ve learned: анализ тональности для моделирования пользовательского опыта в сфере онлайн-образования

Kirina M., Человек: образ и сущность. Гуманитарные аспекты 2024 № 2(58) С. 176–204

The article focuses on the application of opinion mining techniques to evaluate user experience on the Hyperskill educational platform, using Python, Java, and Kotlin programming projects as the basis of analysis. The study utilizes sentiment analysis and keyword extraction methods to gauge users' attitudes towards the platform, learning process, and topics covered. To achieve this, ...

Added: December 9, 2023

Attitude of Russians to the topic of material well-being: analysis of comments in social media

Fabrykant M., Magun V., Милкова М. А., / Series SocArXiv "SocArXiv". 2023.

The current study is a content analysis of news comments posted on the Russian social network VKontakte to investigate the expression of opinions on material well being. Based on NLP methods, we analyze the main groups of expression, which are considered in the context of formulating evaluations and appealing to social norms. We analyze the content of discourse at the micro level; analyze how attitudes towards the topic ...

Added: December 4, 2023

Сентимент-анализ как метод исследования информационной повестки и общественного мнения (на примере СМИ и социальных сетей КНР)

Анташева М. С., Lobanova P., Isaeva J. K. et al., Социология: методология, методы, математическое моделирование 2023 № 57 С. 7–41

The information agenda broadcast by Chinese media resources is a source of up-to-date data on public opinion on key issues of social welfare. Due to the technical peculiarities of the organization of Chinese websites and the need to attract additional resources for automatic processing (parsing) of texts in Chinese, this topic is not widely represented in domestic and foreign studies. The ...

Added: November 9, 2023

ВИЗУАЛИЗАЦИЯ ДАННЫХ В ЭМОЦИОНАЛЬНОМ АНАЛИЗЕ РУССКОЯЗЫЧНЫХ ИНТЕРНЕТ-ТЕКСТОВ НА ОСНОВЕ МОДЕЛИ "КУБ ЛЁВХЕЙМА"

Kolmogorova A., Калинин А. А., В кн.: Язык и искусственный интеллект: Сборник статей по итогам конференции «Лингвистический форум 2020: Язык и искусственный интеллект».: Издательский дом ЯСК, 2023. С. 167–181.

In the paper, we discuss the problem of tools supposed to be effective for visualization of data achieved as result of running algorithms for emotional text analysis. We start by overviewing some technics used to visualize data in projects devoted to exploratory data analysis, sentiment-analysis and emotional text analysis. To continue, we suggest two variants ...

Added: October 31, 2023