Cleaning Up After a Party: Post-processing Thesaurus Crowdsourced Data

Antropova O.; Arslanova E.; Shaposhnikov M.; P. Braslavski; Mukhin M.

doi:10.1007/978-3-030-01204-5_13

Publications

?

Cleaning Up After a Party: Post-processing Thesaurus Crowdsourced Data

P. 133–138.

Antropova O., Arslanova E., Shaposhnikov M., Braslavski P., Mukhin M.

The study deals with post-processing of a noisy collection of synsets created using crowdsourcing. First, we cluster long synsets in three different ways. Second, we apply four cluster cleaning techniques based either on word popularity or word embeddings. Evaluation shows that the method based on word embeddings and existing dictionary definitions delivers best results.

Language: English

DOI

Keywords: crowdsourcing thesaurus

In book

Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, Proceedings

Issue 930. , Switzerland: Springer, 2018.

Человеческая агентность как фактор успеха корпораций

Sorokin P. S., Afanaseva I., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 4 С. 202–224

The article is devoted to the study of manifestations and methods of supporting agentic (i.e. transforming the environment in a direction not determined by it) behavior as a factor of success of contemporary corporations in the condition of neo-structuration, that is, a new phase of societal evolution, which assumes a change in the relationship between ...

Added: September 5, 2025

Contest design and solvers' engagement behaviour in crowdsourcing: The neo-configurational perspective

Tekic A., Alfonzo Pacheco D. V., Technovation 2024 Vol. 132 Article 102986

Companies face the challenges of attracting solvers and motivating them to dedicate their time and effort to develop solutions in crowdsourcing contests. Previous research emphasizes the importance of crowdsourcing contest design for fostering solvers' engagement. However, even though contests are designed as a combination of various design elements, such as seeker's identity disclosure, seeker's status, ...

Added: March 5, 2024

Индивидуальная «агентность» как элемент человеческого потенциала: виды, проявления и эффекты в корпоративном секторе. Научный дайджест №10 (27)

Sorokin P. S., Afanaseva I., Шмаевка В. К. et al., М.: Издательский дом НИУ ВШЭ, 2023.

The issue of agency (enterprise, initiative) is one of the central ones for the corporate sector. The key factor determining the importance of this issue is the processes of ‘destructuration’, that is, the growth of variability in the forms of social organization in various spheres of public life. The authors identified three levels of proactive behavior ...

Added: November 16, 2023

Quantifying local and mesoscale drivers of the urban heat island of moscow with reference and crowdsourced observations

Varentsov Mikhail, Fenner D., Meier F. et al., Frontiers in Environmental Science 2021 Vol. 9 Article 716968

Urban climate features, such as the urban heat island (UHI), are determined by various factors characterizing the modifications of the surface by the built environment and human activity. These factors are often attributed to the local spatial scale (hundreds of meters up to several kilometers). Nowadays, more and more urban climate studies utilize the concept ...

Added: October 4, 2023

New needs of Russian higher education in the digital age

Nikiporets-Takigawa G., Skorodumova O., Melikov I., Revista Conrado 2022 Vol. 18 No. 88 P. 285–290

The article explored new needs in the higher education system in Russia in the context of the global digitalization of society. The innovative possibilities of using the new social network technologies blockchain, big data, collective intelligence technologies, and artificial intelligence in higher education were analyzed. Russian projects for their use in the practical activities of ...

Added: July 19, 2023

Платформа Толока как источник респондентов для онлайн-опроса: опыт оценки качества данных

Gavrilov K. A., Социология: методология, методы, математическое моделирование 2021 № 53 С. 165–209

The article presents the experience of using Yandex Toloka crowdsourcing platform to recruit respondents for an online survey. Analyzing methodological publications on a similar foreign platform Amazon Mechanical Turk we put forward hypotheses about the data quality obtained via Toloka in comparison with the results collected using other convenience sample types –online panels and recruitment ...

Added: February 19, 2023

«Идеографический словарь диалектной языковой личности» как средство изучения картины мира

Zemicheva S., В кн.: Лексикография цифровой эпохи: сборник материалов Международного симпозиума (24–25 сентября 2021 г.).: Издательство Томского государственного университета, 2021. С. 344–346.

Представлен опыт составления электронного идиолектного словаря идеографического типа, созданного на материале записей речи носителя сибирского говора. Кратко охарактеризованы особенности словаря. Описан потенциал его использования в русле когнитивной исследовательской парадигмы. ...

Added: November 1, 2022

О проекте учебного словаря-справочника когнитивных терминов

Romanova T. V., В кн.: Когнитивные исследования языкаВып. 2(49): Когнитивная лингвистика и межкультурная коммуникация.: Тамбов: Издательский дом ТГУ им. Г.Р. Державина, 2022. С. 97–102.

Статья посвящена проекту словаря-справочника терминов, используемых в когнитивной науке, в том числе в когнитивной лингвистике. ...

Added: September 12, 2022

A Scientometric Exploration of Crowdsourcing: Research Clusters and Applications

Ozcan S., Boye D., Arsenyan J. et al., IEEE Transactions on Engineering Management 2022 Vol. 69 No. 6 P. 3023–3037

Crowdsourcing is a multidisciplinary research area that represents a rapidly expanding field where new applications are constantly emerging. Research in this area has investigated its use for citizen science in data gathering for research and crowdsourcing for industrial innovation. Previous studies have reviewed and categorized crowdsourcing research using qualitative methods. This has led to the ...

Added: September 12, 2022

Публичные консультации как способ повышения эффективности проектирования нормативных правовых актов

Medoeva B., Гражданское общество в России и за рубежом 2022 № 2 С. 33–36

Th e paper discusses the advantages and disadvantages of public consultations when conducting regulatory impact assessment. Consultations are considered as a way to identify criminological risks of draft acts in the fi eld of economic activity. To increase the effi ciency of design legal acts, consultations from the category of “open” should go to “mass”. Th e issue ...

Added: June 9, 2022

Urban heat island of the Moscow megacity: the long-term trends and new approaches for monitoring and research based on crowdsourcing data

Varentsov M., Konstantinov P., Shartova N. et al., IOP Conference Series: Earth and Environmental Science 2020 Vol. 606 Article 012063

This paper reports on various aspects of the urban heat island (UHI) of the Moscow megacity – its spatial and temporal variability and linkages with human thermal comfort. Firstly, we analyze long-term trends of air temperature, UHI intensity and thermal stress indices based on meteorological observations over the period 1977-2018. We show that the city ...

Added: March 17, 2022

Social media mining for ideation: Identification of sustainable solutions and opinions

Ozcan S., Suloglu M., Sakar C. O. et al., Technovation 2021 Vol. 107 No. September 2021 P. 1–12

The availability of social media-based data creates opportunities to obtain information about consumers, trends, companies and technologies using text mining techniques. However, the quality of the data is a significant concern for social media-based analyses. The aim of this study was to mine tweets (microblogs) to explore trends and retrieve ideas for various purposes such ...

Added: December 12, 2021

Использование инструментов "электронного парламента" в парламентских процедурах

Koshel A., Вестник Московского университета. Серия 11: Право 2021 № 3 С. 62–86

Foreign legal and political science notes the positive effect from the interpenetration of the institutions of direct and representative democracy, especially by way of modern digital technologies. The key forms of such interpenetration have been and remain the popular (civil) legislative initiative and public hearings (public discussion of bills). The experience of crowdsourcing in carrying ...

Added: December 6, 2021

I paid a bribe: An experiment on information sharing and extortionary corruption

Ryvkin D., Serra D., Tremewan J. C., European Economic Review 2017 Vol. 94 P. 1–22

Theoretical and empirical research on corruption has flourished in the last three decades; however, identifying successful anti-corruption policies remains a challenge. In this paper we ask whether bottom-up institutions that rely on voluntary and anonymous reports of bribe demands, such as the I paid a bribe website first launched in India in 2010, could act ...

Added: October 11, 2021

On the impact of predicate complexity in crowdsourced classification tasks

Ramírez J., Baez M., Casati F. et al., , in: WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining.: Association for Computing Machinery (ACM), 2021. P. 67–75.

Added: October 10, 2021

A cognitive model to enhance professional competence in computer science

Aleshinskaya E., Albatsha Ahmad, , in: Procedia Computer ScienceIssue 169: Postproceedings of the 10th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2019 (Tenth Annual Meeting of the BICA Society).: Elsevier, 2020. P. 326–329.

The paper presents the results of the cognitive modeling of the COMPUTER SCIENCE terminological system in the form of a thesaurus. The thesaurus comprises over 3000 units, which are drawn from explanatory monolingual and bilingual dictionaries of computer science terms representing the basic phenomena and processes in the professional context. Methodologically, the analysis is based ...

Added: April 13, 2021

БУДУЩЕЕ ЦИФРОВОЙ РЕВОЛЮЦИИ: КОЛЛАПС РЫНКА РАБОЧЕЙ СИЛЫ ИЛИ НАУЧНЫЙ КРАУДСОРСИНГ?

Корнилов А. М., Вопросы политической экономии 2020 № 1(21) С. 165–177

Scientific and technical progress, inherent in the "digital economy", which has taken on a new quality under that discourage economic growth; the the conditions of the so-called digital financial instruments that make up for revolution, threatens to do away with the corresponding adverse consequences; labour as a factor of production, if and the new forms ...

Added: February 19, 2021

БАЗОВАЯ ДИЛЕММА ЦИФРОВОЙ ТРАНСФОРМАЦИИ ЭКОНОМИКИ

Корнилов А. М., Теоретическая экономика 2020 № 8(68) С. 32–38

The ongoing process of digital transformation the world economic system is currently experiencing has recently been perceived as a universal solution to all the problems of the world economy. Meanwhile its’ very conceptual basis is such that it promises in the near future rather than construction of a utopian “knowledge economy”, institutionalization of imitation develop ...

Added: February 19, 2021

Fake opinion detection: how similar are crowdsourced datasets to real data?

Fornaciari T., Cagnina L., Россо П. et al., Language Resources and Evaluation 2020 Vol. 54 No. 4 P. 1019–1058

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models ...

Added: October 29, 2020

Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings

Karyaeva M., Braslavski P., Kiselev Y., , in: Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018.: Springer, 2018. P. 76–87.

The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym ...

Added: March 11, 2019

Vote Aggregation Techniques in the Geo-Wiki Crowdsourcing Game: A Case Study

Baklanov A., Fritz S., Khachay M. et al., , in: Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information ScienceVol. 661.: Switzerland: Springer, 2017. P. 41–50.

The Cropland Capture game (CCG) aims to map cultivated lands using around 170000 satellite images. The contribution of the paper is threefold: (a) we improve the quality of the CCG’s dataset, (b) we benchmark state-of-the-art algorithms designed for an aggregation of votes in a crowdsourcing-like setting and compare the results with machine learning algorithms, (c) ...

Added: January 23, 2019