PolSentiLex: Sentiment Detection in Socio-political Discussions on Russian Social Media

O. Koltsova; Alexeeva S.; S. Pashakhin; S. Koltsov

doi:10.1007/978-3-030-59082-6_1

Publications

?

PolSentiLex: Sentiment Detection in Socio-political Discussions on Russian Social Media

P. 1–16.

Koltsova O., Alexeeva S., Pashakhin S., Koltsov S.

We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (~1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of F-macro, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while show- ing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts.

Language: English

DOI

Keywords: Russian language sentiment analysis social media socio-political domain lexicon-based approach

Publication based on the results of:

Online communication: cognitive limits and methods of automatic analysis (2020)

In book

Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science

Book 1292: Communications in Computer and Information Science. , Springer, 2020.

Russian Pronouns with Focus Antecedents: Coreference and Binding in Corpora

Tiskin D., Компьютерная лингвистика и интеллектуальные технологии 2026 No. 24 P. 656–665

Despite a lot of interest for the factors influencing the choice of pronoun (reflexive or personal) with an antecedent in Russian, the role of the anaphotic relation—coreference or semantic binding—has been understudied, including disagreements as to the acceptability of particular data points. To clarify things, I employ large corpora (Araneum and GICR) to study the ...

Added: July 19, 2026

Тезисы докладов Пятнадцатых Шмелёвских чтений: (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева):Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности

М.: Институт русского языка им. В.В. Виноградова РАН, 2026.

Сборник тезисов Пятнадцатых Шмелёвских чтений (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева) Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности. Охватывает разные аспекты современной русистики: от исторической лексикологии до современных трансформаций прагматики и семантики слов. ...

Added: June 23, 2026

Juxtapositional vs. possessive-like encoding in Russian specificational constructions

Logvinova N., Russian linguistics 2026 Vol. 50 Article 11

This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...

Added: May 18, 2026

Перспективы медиа-мониторинга в исследованиях общественного мнения (на примере доверия президенту)

Ankudinov I., Социология: методология, методы, математическое моделирование 2025 № 61 С. 165–203

The changing political mood of Russians is a constant subject of interest for sociological agencies. With the development of the Internet, conventional questionnaire research began to be supplemented by online surveys and, despite some skepticism, by social media mining. This article attempts to adjust an accidental web-sample so as to bring its estimates closer to ...

Added: April 22, 2026

Дискриминативная лемматизация сокращений в эпоху LLM

Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155

This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...

Added: March 10, 2026

Rubic2: Ensemble Model for Russian Lemmatization

Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.

Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...

Added: March 10, 2026

Transformer-based approaches for lemmatizing abbreviations in Russian texts

Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47

This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...

Added: March 10, 2026

Data Analytics for Predicting Situational Developments in Smart Cities: Assessing User Perceptions

Kharlamov A. A., Pilgun M., , in: Special Issue Sensing Technology for Smart Cities: Data, Analytics, and VisualizationsVol. 24. Issue 15.: [б.и.], 2024.

The analysis of large volumes of data collected from heterogeneous sources is increasingly important for the development of megacities, the advancement of smart city technologies, and ensuring a high quality of life for citizens. This study aimed to develop algorithms for analyzing and interpreting social media data to assess citizens’ opinions in real time and ...

Added: February 22, 2026

Онлайн-дискурс о демографической политике Китая: методологические аспекты анализа постов в социальной сети Weibo

Bocharova A., Денисов И. Е., Зуенко И. Ю., Вестник Санкт-Петербургского университета. Востоковедение и африканистика 2025 Т. 17 № 2 С. 366–377

The article focuses on analyzing the perceptions of recent changes in China’s demographic pol-icy by the contemporary Chinese society. These changes involved the relaxation of restrictions on the number of children in a family, first to two children in 2015 and subsequently to three children in 2021. The relevance of this research stems from the ...

Added: February 19, 2026

Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки

Затулин К. Ф., Егоров В. Г., Докучаева А. В. et al., М.: Институт диаспоры и интеграции (Институт стран СНГ), 2025.

Книга «Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки» содержит результаты исследования, проведенного в Абхазии, Азербайджане, Армении, Беларуси, Грузии, Казахстане, Киргизии, Латвии, Литве, Молдове, Приднестровской Молдавской Республике, Таджикистане, Узбекистане, Эстонии и Южной Осетии. Исследование выполнено Институтом диаспоры и интеграции (Институтом стран СНГ) в 2024 году. Оно включило в себя анализ нормативно-правовых ...

Added: February 3, 2026

Методика обучения младших школьников чтению на русском и английском языках: сходство и различие

[б.и.], 2022.

The article highlights the importance of the role of teaching reading to children, its specific features and components; the main methods used in teaching reading to children both in Russian and in English are considered; a comparative characteristic of the two languages is made. In addition, the article also compares the methods of teaching reading ...

Added: January 31, 2026

Semi-fake indexicals in Russian

Tiskin D., Типология морфосинтаксических параметров 2025 Vol. 8 No. 1 P. 112–129

There are several rival theories of fake indexicals, i.e. bound indexicals (prominently pronouns) whose φ-features do not semantically contribute to focus alternatives (e.g. Only Mary did her homework, John didn’t do his). According to Minimal Pronoun theories (such as Kratzer’s or Wurmbrand’s), bound pronouns are Merged without φ-features and acquire them under binding via agreement-like ...

Added: January 26, 2026

Некоторые модификации к теории связанных употреблений индексальных выражений И. Басси

Tiskin D., Типология морфосинтаксических параметров 2024 Т. 7 № 1 С. 107–123

Fake indexicals (FIs), or bound-variable uses of e.g. 1st - and 2 nd -person pronouns, have been analysed by Bassi (2021) as arising from a post-syntactic process of inspecting the features of the referent. This leads to a peculiar analysis of the syntax and semantics of relative clauses containing FIs. I argue for a more ...

Added: January 26, 2026

Changes in the UK leading media's portrayal of China during the Covid-19 pandemic and the special military operation

Balakina Y. V., Yin Z., Известия Саратовского университета. Новая серия. Серия: Филология. Журналистика 2025 Vol. 25 No. 2 P. 229–236

The aim of the present study is to trace changes in the construction of the image of China in the British media during two crisis periods: the COVID-19 pandemic and the Russian military operation. Each period encompasses a panic (escalation) phase and a recovery (stagnation) phase. Using data from the Factiva database, 70,356 articles published ...

Added: January 20, 2026

Сопоставительный анализ уникальных впечатлений американских туристов о мемориале Линкольну в доковидный и постковидный периоды

Smolyanina E., Morozova I., Харитонова Н. В., Географический вестник 2025 № 4 (75) С. 162–177

The unique tourist experience is one of the main components of tourism activity. However, it is not studied in Russian and Western science. This determined the purpose of the study, that is to identify the characteristics of unique American tourists’ experiences in online reviews about the Lincoln Memorial on the travel site TripAdvisor in the ...

Added: January 7, 2026

Предупрежден — значит вооружен? Антивакцинный контент и восприятие пользователями пометок-предупреждений

Petrov I., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 6 С. 110–131

The proliferation of misinformation online has compelled social media platforms to develop effective countermeasures. This study investigates user perceptions of different interface warning labels, using anti-vaccine content as an example. The research focuses on the social network «VKontakte» (VK), a major platform in the Russian-speaking internet segment that has previously experimented with such labels. We ...

Added: December 30, 2025

Проблема формирования национального самосознания у детей в процессе изучения родного языка в трудах К. Д. Ушинского

Бизяева Н. Д., Проблемы современного образования 2025 № 4 С. 134–141

This study is the result of understanding the views of K. D. Ushinsky on the problem of forming national self-awareness in children in the process of studying their native language. It was determined that the idea of nationality, expressed in the theoretical and axiological principles of K. D. Ushinsky, was quite clearly expressed in “The ...

Added: December 16, 2025

Пользование сетевыми социальными медиа в турбулентном обществе: случай России 2022―2024 гг.

Davydov S. G., Мониторинг общественного мнения: Экономические и социальные перемены 2026 № 2 С. 236–258

Настоящая статья содержит результаты анализа динамики аудитории сетевых социальных медиа в России с января 2022 г. по сентябрь 2024 г. Исследование основано на данных из двух мониторинговых источников: проекта измерения аудитории Интернета Cross Web компании Медиаскоп и исследования публикационной активности компании Медиалогия. Для решения компаративных задач предложены два индекса: динамический структурный и динамический интегральный. Анализ позволил ...

Added: December 12, 2025