Detecting ethnicity-targeted hate speech in Russian social media texts

E. Pronoza; P. Panicheva; O. Koltsova; Rosso Paolo

doi:10.1016/j.ipm.2021.102674

Publications

?

Detecting ethnicity-targeted hate speech in Russian social media texts

Information Processing and Management. 2021. Vol. 58. No. 6. Article 102674.

Pronoza E., Panicheva P., Koltsova O., Rosso Paolo

Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user texts (such as irony, false generalization and attribution of unfavored actions to targeted groups), users’ inclination to express opposite attitudes to different ethnic groups in the same text and, finally, lack of research on languages other than English. In this work we address several of these problems in the task of ethnicity-targeted hate speech detection in Russian-language social media texts. This approach allows us to differentiate between attitudes towards different ethnic groups mentioned in the same text – a task that has never been addressed before. We use a dataset of over 2,6M user messages mentioning ethnic groups to construct a representative sample of 12K instances (ethnic group, text) that are further thoroughly annotated via a special procedure. In contrast to many previous collections that usually comprise extreme cases of toxic speech, representativity of our sample secures a realistic and, therefore, much higher proportion of subtle negativity which additionally complicates its automatic detection. We then experiment with four types of machine learning models, from traditional classifiers such as SVM to deep learning approaches, notably the recently introduced BERT architecture, and interpret their predictions in terms of various linguistic phenomena. In addition to hate speech detection with a text-level two-class approach (hate, no hate), we also justify and implement a unique instance-based three-class approach (positive, neutral, negative attitude, the latter implying hate speech). Our best results are achieved by using fine-tuned and pre-trained RuBERT combined with linguistic features, with F1-hate=0.760, F1-macro=0.833 on the text-level two-class problem comparable to previous studies, and F1-hate=0.813, F1-macro=0.824 on our unique instance-based three-class hate speech detection task. Finally, we perform error analysis, and it reveals that further improvement could be achieved by accounting for complex and creative language issues more accurately, i.e., by detecting irony and unconventional forms of obscene lexicon.

Research target: Computer Science Philology and Linguistics Media and Communications

Priority areas: sociology humanitarian IT and mathematics

Keywords: Russian language deep learning hate speech detection ethnic hate

Publication based on the results of:

Modeling individuals behavior and socio-psychological characteristics based on multimodal digital traces (2021)

Reported speech constructions in Chuvash: A corpus- and elicitation-based study

Knyazev M., Voprosy Jazykoznanija 2026 No. 1 P. 74–104

The paper is a descriptive survey of reported speech constructions in standard and Maloe Ka-rachkino (Poshkart) Chuvash based on a typological questionnaire. On the basis of corpus data, it isshown that reported speech constructions vary depending on whether reported speech is introduced bya complementizer-like element in combination with an ordinary speech verb; directly by the ...

Added: January 22, 2026

О фотографии

М.: Ад Маргинем Пресс, 2025.

In his essay "On Photography," the prominent German media theorist Vilém Flusser (1920-1991) asks whether human freedom is possible in the information society, in a world dominated by cameras, where our thoughts, feelings, desires, and actions are robotized. The photographer's intentions are inscribed within the strictly limited corridor of the camera's program—a black box simulating ...

Added: January 22, 2026

LAMBO: Landmarks Augmentation With Manifold-Barycentric Oversampling

Bespalov Y., Buzun N., Kachan O. et al., IEEE Access 2022 No. 10 Article 3219934

We propose the first data augmentation method based on optimal transport theory, with the generated data being guaranteed to belong to the original data manifold. The proposed algorithm randomly samples a clique in the nearest-neighbors graph representing the data knowledge and computes the Wasserstein barycenter between the neighbours with random uniform weights. Being extremely natural- ...

Added: January 21, 2026

Blurred Magnitude Homology of Functional Connectome for ASD Diagnosis

Качура А. С., Chernyshev V. L., Kachan O. et al., Frontiers in Psychiatry 2026 No. 16 Article 1677282

Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders. Existing studies show that adults with ASD may experience accelerated or altered neurocognitive aging. Consequently, cognitive decline in people with ASD can be delayed if timely measures are taken to treat this disorder. This study focuses on the development of a new algorithm ...

Added: January 21, 2026

Морфосинтаксический статус и семантика шугнанского показателя -ard: к развитию новых падежных маркеров в иранских языках

Падалка П. В., Ryzhova D., Чистякова Д. Г., Вопросы языкознания 2026 № 1 С. 40–48

The article is dedicated to the morphosyntactic properties and grammatical functions of the Shughni marker -ard. Shughni, like many other Iranian languages, has a reduced case system that seems to be gradually evolving and expanding. We demonstrate that the marker -ard is one of the main candidates for the status of a new case marker ...

Added: January 21, 2026

PINDAR PYTHIAN 2.13–20: WHAT DOES HIERON HAVE IN COMMON WITH CINYRAS?

Akhunova O., Classical Philology 2026 Vol. 121 No. 1 P. 84–93

In this article I attempt to answer the question: why and on what basis does Pindar compare Hieron with king Cinyras? Pindar marks three points of similarity: in the “portrait” of Cinyras these points are “favorite of Apollo” and “priest of Aphrodite”; the third point is indicated only in the “portrait” of Hieron – this is ...

Added: January 21, 2026

19th Annual Conference on Theory and Applications of Models of Computation, TAMC 2025

Springer, 2026.

This book constitutes the proceedings of the 19th Annual Conference on Theory and Applications of Models of Computation, TAMC 2025, which was held in Jinan, China, during September 19–21, 2025. ...

Added: January 20, 2026

Constructing China's image in the British media during international crises: a case study of The Times newspaper since February 2022

Yin Z., Вестник Пермского университета. Серия: Политология 2025 Vol. 19 No. 2 P. 130–142

International society has noted a series of interlinked events that have implications for the construction of national representations within media discourse. This study probes how The Times represented China's image during global crises, with special focus on the period after Russia initiated its special military operation on February 24, 2022. Although this provides key context, ...

Added: January 20, 2026

Changes in the UK leading media's portrayal of China during the Covid-19 pandemic and the special military operation

Balakina Y. V., Yin Z., Известия Саратовского университета. Новая серия. Серия: Филология. Журналистика 2025 Vol. 25 No. 2 P. 229–236

The aim of the present study is to trace changes in the construction of the image of China in the British media during two crisis periods: the COVID-19 pandemic and the Russian military operation. Each period encompasses a panic (escalation) phase and a recovery (stagnation) phase. Using data from the Factiva database, 70,356 articles published ...

Added: January 20, 2026

11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers

Springer, 2026.

Added: January 20, 2026

Experimental evidence suggests that null complement anaphora in Russian is not reducible to clausal ellipsis

Knyazev M., Folia Linguistica 2025

Null complement anaphora, NCA (e.g., I suggested the price was too high, and she agreed ∅.), is a long known but poorly understood phenomenon subject to idiosyncratic lexical restrictions. In languages like Russian, however, it is (or appears) productive, with verbs not allowing NCA hard to nd, raising the question whether omission of the clausal argument ...

Added: January 19, 2026

Трансформация идеологии: экранизация романа «Дюна» Дени Вильнёвым как новый медиатекст

Aliev R. T., Праксема. Проблемы визуальной семиотики 2026 № 1 С. 46–74

This article investigates ideological transformations in Denis Villeneuve's screen adaptation of Frank Herbert's novel Dune. The relevance of the research is determined by the fact that adapting complex literary works for mass cinema inevitably alters their semantic structure, affecting not only plotlines but also the philosophical, political, and cultural dimensions of the original text. The ...

Added: January 19, 2026

Polarity-sensitive exceptives: the case of Chechen

Беркович М. А., Proceedings of ConSOLE 2025 P. 258–274

In several languages, the semantics comparable to that of only can be expressed with a bi-partite construction, which consists of a negation marker and a focus particle. This work investigates one such construction in the Chechen language. I try to show that its properties are not captured by the existing analyses of such constructions and ...

Added: January 19, 2026

Русский язык и русская культура во Вьетнаме: проблемы обучения и исследования

Britov I., Ханой: Издательство «Ханойский государственный университет», 2025.

Без аннотации ...

Added: January 18, 2026

Базисная лексика шугнанского и бартангского языков

Armand E., Бадеев А. О., Родной язык: лингвистический журнал 2025 № 2 С. 153–175

The article analyzes the basic vocabulary (Swadesh list) of two closely related languages of the Shughni-Rushani subgroup of the East Iranian group of languages. The lists were collected by the authors using the elicitation method during field research in the summer of 2025. Special attention is paid to borrowings from Tajik, as well as to ...

Added: January 18, 2026

Травестия петраркизма: следы «Плеяды» во французской бурлескной поэме XVII в

Golubkov A., Studia Litterarum 2025 Т. 10 № 4 С. 92–117

The object of research in this article is the “caprice” (burlesque poem) Melon (1634) by the French poet M.A.G. de Saint-Amant who at the end of the 17th century, despite the negative attitude of Nicolas Boileau-Despréaux to his work, was revered by the camp of the “Moderns” during the “the Quarrel of the Ancients and ...

Added: January 18, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Denomination, Religiosity and Anti-Immigrant Attitudes in Europe:Comparative Evidence from the European Social Survey

Dorkhanov I., Sokolov B., / Series OSF "SocArXiv". 2025.

This study investigates the relationship between individual religiosity and attitudes towards immigrants of different religious backgrounds in Europe. Using data from the 7th wave of the European Social Survey (2014-2015), we examine the influence of individual denomination and subjective religiosity level on hostility towards Muslim immigrants and the importance of immigrants’ Christian background. Our analysis, ...

Added: December 23, 2025

Classification Approach to Mapping Cultural Differences: An Illustration Using Survey Data from 60 Russian Regions

Nastina E., Sokolov B., / Series OSF "SocArXiv". 2025.

We argue that a classification-based approach to measuring cultural differences across countries or subnational regions is a promising complement, and sometimes an alternative, to the widely used dimensional method in cross-cultural research. The latter summarises cultural variation using continuous dimensions, for example, Hofstede’s famous individualism-collectivism dimension. However, this approach relies on strong parametric assumptions, which are ...

Added: December 23, 2025

Cross-Nationally, Non-Probability Web Surveys Demonstrate Poorer DemographicCoverage and Yield More Liberal Estimates of Public Opinion than F2F Surveys

Korsunava V., Sokolov B., / Series OSF "SocArXiv". 2025.

Non-probability web surveys offer several advantages over face-to-face (F2F) interviews—they are cheaper, faster, more accessible, and reduce interviewer effects and desirability bias. As such, they are increasingly popular in both academic and commercial research. However, they often yield demographically biased samples, raising concerns about the accuracy of the resulting public opinion estimates. Most studies on ...

Added: December 23, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Detecting Ethnic Conflict in Social Media with Transformers and Augmented Data

Koltsova O., Surkov A., Procedia Computer Science 2025 Vol. 258 P. 2382–2390

Chest X-ray pathology prediction play a very important role in early disease detection, enabling timely intervention and improving patient outcomes. Detection of ethnic conflict mentioning, discussion, or verbal participation therein in user-generated content is a socially important task, as such content has been proven related to ethnic clashes on the ground. Yet this task has not been ...

Added: November 28, 2025

Речевые акты с вежливыми диминутивами: жанровые и дискурсивные особенности

Fufaeva I., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Т. 24 № 4 С. 78–90

This study delves into speech acts utilizing diminutives for politeness, focusing on their discursive and genre-related aspects. It draws on authorial recordings of spoken discourse, data from the National Corpus of the Russian Language, and recordings of urban speech from the 1970s and late twentieth century. The research highlights the potential usage of polite diminutives in ...

Added: November 25, 2025