Training Multilingual and Adversarial Attack-Robust Models for Hate Detection on Social Media

?

Training Multilingual and Adversarial Attack-Robust Models for Hate Detection on Social Media

P. 196–202.

Ryzhova A., Deviatkin D., Volkov S., Budzko V.

Social media provide plenty of textual information in various languages. This information can contain or provoke hatred towards different social or religious groups. In this paper, we study methods to process short text messages in English, Hindi, and Russian and identify such intolerance with cross-lingual Transformer models. Moreover, these models can be easily adapted to analyze other languages. We fine-tuned these models with several training techniques to build accurate hate speech detectors that are robust to adversarial attacks. Additional preprocessing was carried out for all datasets to improve the quality of model training. Also, for one of the training datasets, we applied the text attack algorithm that replaces some words with synonyms. For some languages, such an attack can greatly reduce the quality of the model. Experiment results show that mixing adversarial examples to a training dataset and combining deep models to randomized ensembles allows not only to reduce test error on attacked data for languages from the dataset (Hindi, Russian) but also to achieve better accuracy in other languages.

Language: English

Text on another site

In book

Procedia Computer Science: 2022 Annual International Conference on Brain-Inspired Cognitive Architectures for Artificial Intelligence: The 13th Annual Meeting of the BICA Society

Vol. 213. , [б.и.], 2022.

Анализ влияния обфускации входных данных на эффективность языковых моделей в обнаружении инъекции подсказок

Krokhin A., Гусев М. М., Программные системы и вычислительные методы 2025 № 2

The article addresses the issue of prompt obfuscation as a means of circumventing protective mechanisms in large language models (LLMs) designed to detect prompt injections. Prompt injections represent a method of attack in which malicious actors manipulate input data to alter the model's behavior and cause it to perform undesirable or harmful actions. Obfuscation involves ...

Added: October 4, 2025

Hate Speech and Target Community Detection in Nastaliq Urdu Using Transfer Learning Techniques

Malik M. S., Aftab N., Mamdouh Jamjoom M., IEEE Access 2024 Vol. 12 P. 116875–116890

Freedom of expression on social media has provided oppressed people with many opportunities to raise their voices against violence and injustice, but this freedom is being misused to spread various forms of hate speech. Several studies have been conducted to identify hate speech in high-resource languages, however, work on under-resource languages is very limited, especially for Nastaliq Urdu. ...

Added: December 11, 2024

Учебно-методические материалы мастер-класса «Состязательные атаки на нейронные сети распознавания изображений» для студентов и школьников

Pantiukhin D., Информатика и образование 2023 Т. 38 № 1 С. 55–63

The problem of neural network vulnerability has been the subject of scientific research and experiments for several years. Adversarial attacks are one of the ways to “trick” a neural network, to force it to make incorrect classification decisions. The very possibility of adversarial attack lies in the peculiarities of machine learning of neural networks. The ...

Added: April 14, 2023

Detecting ethnicity-targeted hate speech in Russian social media texts

Pronoza E., Panicheva P., Koltsova O. et al., Information Processing and Management 2021 Vol. 58 No. 6 Article 102674

Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user ...

Added: September 2, 2021