Large Dataset and Language Model Fun-Tuning for Humor Recognition

Blinov V.; Bolotova-Baranova V.; P. Braslavski

doi:10.18653/v1/P19-1394

Publications

?

Large Dataset and Language Model Fun-Tuning for Humor Recognition

P. 4027–4032.

Blinov V., Bolotova-Baranova V., Braslavski P.

The task of humor recognition has attracted a lot of attention recently due to the urge to process large amounts of user-generated texts and rise of conversational agents. We collected a dataset of jokes and funny dialogues in Russian from various online resources and complemented them carefully with unfunny texts with similar lexical properties. The dataset comprises of more than 300,000 short texts, which is significantly larger than any previous humor-related corpus. Manual annotation of 2,000 items proved the reliability of the corpus construction approach. Further, we applied language model fine-tuning for text classification and obtained an F1 score of 0.91 on a test set, which constitutes a considerable gain over baseline methods. The dataset is freely available for research community.

Keywords: computational humor

In book

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Association for Computational Linguistics, 2019.

KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines

Alexander Baranov, Anna Palatkina, Makovka Y. et al., , in: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing.: Shumen: INCOMA Ltd, 2025. P. 125–132.

We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts – each headline is accompanied by the news lead ...

Added: February 3, 2026

You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models

Baranov A. M., Kniazhevskii V., Braslavski P., , in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.: Singapore: Association for Computational Linguistics, 2023. P. 13701–13715.

In this study, we focus on automatic humor detection, a highly relevant task for conversational AI. To date, there are several English datasets for this task, but little research on how models trained on them generalize and behave in the wild. To fill this gap, we carefully analyze existing datasets, train RoBERTa-based and Naïve Bayes ...

Added: December 22, 2023

Jokingbird: Funny Headline Generation for News

Login N., Alexander Baranov, Braslavski P., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 97–109.

In this study, we address the problem of generating funny headlines for news articles. Funny headlines are beneficial even for serious news stories – they attract and entertain the reader. Automatically generated funny headlines can serve as prompts for news editors. More generally, humor generation can be applied to other domains, e.g. conversational systems. Like ...

Added: December 6, 2022