?
Evaluating the Pragmatic Competence of Large Language Models in Detecting Mitigated and Unmitigated Types of Disagreement
This study presents a framework for evaluating the effectiveness of language models (LLMs) in detecting disagreement across a wide range of pragmatic strategies, from mitigated forms to overt verbal aggression. Special attention is given to complex cases of implicit manifestations of irony and sarcasm, which pose significant challenges for both automated analysis and interpersonal communication. Experimental testing of LLMs was conducted in two types of tasks: binary classification for identifying disagreement and classification of specific strategies for its expression. The results showed that large multilingual models outperformed other models, especially in binary classification. However, models that focus primarily on the Russian language, such as GigaChat and YaGPT, tend to interpret irony and sarcasm more accurately and have a higher result density. Comparative analysis with human judgments revealed that, despite progress, the accuracy of sarcasm detection by LLMs still lags significantly behind human judgments. The results suggest a need for further optimization of LLMs to improve their pragmatic competence in real communicative situations.