Vote’n’Rank: Revision of Benchmarking with Social Choice Theory

M. Rofin; V. Mikhailov; M. Florinsky; Kravchenko A.; E. Tutubalina; T. Shavrina; D. Karabekyan; E. Artemova

?

Vote’n’Rank: Revision of Benchmarking with Social Choice Theory

P. 670–686.

Rofin M., Mikhailov V., Florinsky M., Kravchenko A., Tutubalina E., Shavrina T., Karabekyan D., Artemova E.

The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives. Although the paradigm is shifting towards more fine-grained evaluation across diverse tasks, the delicate question of how to aggregate the performances has received particular interest in the community. In general, benchmarks follow the unspoken utilitarian principles, where the systems are ranked based on their mean average score over task-specific metrics. Such aggregation procedure has been viewed as a sub-optimal evaluation protocol, which may have created the illusion of progress. This paper proposes Vote’n’Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory. We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields and identify the best-performing systems in research and development case studies. The Vote’n’Rank’s procedures are more robust than the mean average while being able to handle missing performance scores and determine conditions under which the system becomes the winner.

Language: English

Text on another site

Keywords: Natural Language Processing (NLP)

Publication based on the results of:

Models and method for analysis of unstructured data, data mining and recommender systems (2023)

In book

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Dubrovnik: Association for Computational Linguistics, 2023.

The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group

Afanasev I., , in: Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023). Association for Computational Linguistics, 2023. P. 174–186.

The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of ...

Added: May 15, 2023

AI-generated text boundary detection with RoFT

Kushnareva L., Gaintseva T., German Magai et al., / Series Computer Science "arxiv.org". 2024.

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap ...

Added: May 29, 2024

Усовершенствование классификации научных событий с помощью кластеризации смежных областей исследований

Morkovkin A., Ilvovsky D., В кн.: ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ И НАНОТЕХНОЛОГИИ (ИТНТ-2024). Самара: Самарский национальный исследовательский университет имени академика С.П. Королева, 2024.

Классификация научных событий – это сложная задача, требующая точного определения и назначения тематических категорий. Задача усложняется, когда встречаются общие или неточные категории, не отражающие суть конкретной научной дисциплины. В нашем исследовании представлена методология, базирующаяся на принципах кластеризации областей исследований (FOS). Этот подход позволил значительно улучшить процесс классификации научных мероприятий, обеспечивая более точное и полное представление ...

Added: May 17, 2024

Studenda.com: Meta-University Platform Based on Event Classification and Open Academic Graph

Morkovkin A., Sobolev M., Gronas M. et al., , in: Proceedings of the 38th AAAI Conference on Artificial Intelligence. [б.и.], 2024.

Studenda.com pioneers the concept of a meta-university. As a cutting-edge platform, it scouts and collates live knowledge events from premier global universities and research hubs. Organizing them based on topics, disciplines, and fields, it facilitates seamless remote participation. Employing a com- bination of science mapping and the SciBERT model, we’ve developed a methodology to classify ...

Added: October 4, 2023

A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

Vasilii A. Gromov, Nikita S. Borodin, Asel S. Yerbolova, Complexity 2024 Vol. 2024 No. 1 Article 8863360

Te present paper introduces a novel object of study, a language fractal structure; we hypothesize that a set of embeddings of all n-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all n). Te ...

Added: June 29, 2024

Синтаксическая позиция «народа» в политическом дискурсе левого и правого популизма (опыт синтаксического анализа на основе NLP)

Галочкин А. Е., Филологические науки в МГИМО 2024 Т. 10 № 2 С. 23–37

This paper attempts to measure populism in English-language speeches of politicians using computational linguistics methods. The relevance of this study is related not only to the rise of populism in the world and the importance of understanding the mechanisms of political discourse, but also to the lack of linguistic research in the context of corpus ...

Added: September 19, 2024

Automatic Detection of Borrowings in Low-Resource Languages of the Caucasus: Andic branch

Zaitsev K., Minchenko A., , in: Proceedings of the first workshop on NLP applications to field linguistics. Gyeongju: International Conference on Computational Linguistics, 2022. P. 34–41.

Linguistic borrowings occur in all languages. Andic languages of the Caucasus have borrowings from different donor-languages like Russian, Arabic, Persian. To automatically detect these borrowings, we propose a logistic regression model. The model was trained on the dataset which contains words in IPA from dictionaries of Andic languages. To improve model’s quality, we compared TfIdf ...

Added: September 29, 2024

Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option

Yakovlev K., Nikolenko S., Bout A., , in: Findings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, 2024. P. 5967–5974.

The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top-k tools selected by ToolkenGPT and the second ...

Added: November 22, 2024

Findings of the Association for Computational Linguistics: NAACL 2022

Seattle: Association for Computational Linguistics, 2022.

Findings of the Association for Computational Linguistics: NAACL 2022 ...

Added: November 1, 2022

Past Voices, Present Insights: Sociolinguistic Research through Literary Artifacts

Sherstinova T., Ziulkova E., Kirina M., , in: Proceedings of the 35th Conference of Open Innovations Association FRUCT, 24-26 April 2024, Tampere, FinlandIssue 1. FRUCT Oy, 2024. P. 675–682.

Oral speech, historically the foundational mode of human communication, has not been explored as extensively as its written counterpart. This disparity underscores the necessity of examining sociolinguistic characteristics of speech across time. Current analyses often rely on data from contemporary speech corpora, yet understanding historical speech patterns is equally vital. Literary works, particularly from periods ...

Added: May 27, 2024

Индекс этичности российских банков на основе искусственного интеллекта

Storchevoy M., Parshakov P., Paklina S. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2024 Т. 520 № 6

Measuring a company's ethics is an important element in the mechanism of regulating the behavior of market participants, as it allows consumers and regulators to make better decisions, which has a disciplining effect on companies. We tested various methods of machine analysis of consumer feedback from Russian banks and developed an Ethics Index that allows ...

Added: October 31, 2024

Alternative method sentiment analysis using emojis and emoticons

Surikov A., Evgeniia Egorova, Procedia Computer Science 2020 Vol. 178 P. 182–193

Our research aims to develop an alternative method for analyzing the tonality of the texts. Most of the traditional methods for determining tonality classes are based on text analysis and ignore various emotional indicators that users actively used in social networks. Therefore, it improves the quality of predicting the tonality of the class. The study ...

Added: May 15, 2024

The Emotion in Text Analyzer: How to Visualize Its Output

Anastasia Kolmogorova, Alexander Kalinin, , in: Literature, Language and Computing: Russian Contribution. Springer, 2023. P. 211–221.

The article summarizes the results of the project conducted in the field of emotional text analysis. The project aim is to build up an analyzer able, according to the model of “Lövheim Cube”, to detect eight emotions in the Internet-texts in Russian. Having collected a labeled dataset and trained ML models, we faced the problem ...

Added: October 31, 2023

Disambiguation in context in the Russian National Corpus: 20 yeas later

Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22. [б.и.], 2023. P. 307–318.

An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...

Added: September 15, 2023

Text Detoxification using Large Pre-trained Neural Models

Dale D., Voronov A., Dementieva D. et al., , in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2021. Ch. 629 P. 7979–7996.

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and ...

Added: July 21, 2023

Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula

Бриллиантов К. Ю., Pavutnitskiy F., Pasechnyuk D. et al., / Series Computer Science "arxiv.org". 2023.

Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. ...

Added: May 29, 2024

Trend Detection Using NLP as a Mechanism of Decision Support

P. A. Lobanova, I. F. Kuzminov, E. Yu. Karatetskaia et al., Scientific and Technical Information Processing 2023 Vol. 50 No. 5 P. 440–448

The purpose of this article is to present the principles of a developed algorithm for identifying trends based on the analysis of big text data and presenting the result in formats that are convenient for decision makers to be implemented in the iFORA Big Data Mining System. The paper provides an overview of existing text analytics algorithms; outlines ...

Added: November 21, 2023

Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue

Association for Computational Linguistics, 2023.

Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, SIGdial 2023, Prague, Czechia ...

Added: October 6, 2023

Язык и искусственный интеллект: Сборник статей по итогам конференции «Лингвистический форум 2020: Язык и искусственный интеллект»

Издательский дом ЯСК, 2023.

The collection presents articles by participants of the "Linguistic Forum 2020: Language and Artificial Intelligence" (conference, November 2020, RAS), reflecting general and specific problems of scientific research in the field of linguistics and computer technologies. The authors of the published articles offer solutions to special issues against the background of larger-scale objects of scientific heuristics ...

Added: October 31, 2023

Advances in Information Retrieval. 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III

Cham: Springer, 2023.

The three-volume set LNCS 13980, 13981 and 13982 constitutes the refereed proceedings of the 45th European Conference on IR Research, ECIR 2023, held in Dublin, Ireland, during April 2-6, 2023. The 65 full papers, 41 short papers, 19 demonstration papers, 12 reproducibility papers consortium papers, 7 tutorial papers, and 10 doctorial consortium papers were carefully reviewed ...

Added: April 4, 2023

CLEF 2024 Working Notes

CEUR Workshop Proceedings, 2024.

Added: September 29, 2024

ВИЗУАЛИЗАЦИЯ ДАННЫХ В ЭМОЦИОНАЛЬНОМ АНАЛИЗЕ РУССКОЯЗЫЧНЫХ ИНТЕРНЕТ-ТЕКСТОВ НА ОСНОВЕ МОДЕЛИ "КУБ ЛЁВХЕЙМА"

Kolmogorova A., Калинин А. А., В кн.: Язык и искусственный интеллект: Сборник статей по итогам конференции «Лингвистический форум 2020: Язык и искусственный интеллект». Издательский дом ЯСК, 2023. С. 167–181.

In the paper, we discuss the problem of tools supposed to be effective for visualization of data achieved as result of running algorithms for emotional text analysis. We start by overviewing some technics used to visualize data in projects devoted to exploratory data analysis, sentiment-analysis and emotional text analysis. To continue, we suggest two variants ...

Added: October 31, 2023

NLP methods for automatic candidate’s CV segmentation

Tikhonova M., Gavrishchuk A., , in: 2019 International Conference on Engineering and Telecommunication (EnT). IEEE, 2019. P. 1–5.

The problem of CV (or resume) segmentation and automatic extraction becomes increasingly relevant nowadays as long as it could simplify candidate selection process. The paper proposes a new method of automatic CV segmentation and parsing. The described algorithm is based on Natural Language Processing and Machine Learning methods. The proposed procedure allows to extract information ...

Added: September 22, 2023

Towards Computationally Feasible Deep Active Learning

Tsvigun A., Shelmanov A., Kuzmin G. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2022. Seattle: Association for Computational Linguistics, 2022. P. 1198–1218.

Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many others. One of such problems is the excessive computational resources required to train an acquisition model and estimate its uncertainty ...

Added: November 1, 2022