Rethinking Crowd Sourcing for Semantic Similarity

?

Rethinking Crowd Sourcing for Semantic Similarity

2021.

Solomon S., Cohn A., Rosenblum H., Hershkovitz C., Yamshchikov I. P.

Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

Text on another site

Метод структурных схем компьютерного морфологического анализа словоформ естественного языка

Egorova E., Лаврентьев А. М., Chepovskiy A., Фундаментальная и прикладная математика 2014 Т. 19 № 3 С. 91–109

В работе предлагается метод структурных схем в качестве модели морфологического анализа словоформ естественного языка с развитым аффиксальным словообразованием и словоизменением. Дано описание алгоритма выделения псевдоосновы, его модификация, а также алгоритм восстановления грамматических характеристик словоформ. Описано применение предложенного метода для анализа словоформ французского языка. Представлены результаты работы предложенных алгоритмов. ...

Added: February 22, 2015

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

Stroudsburg, PA: Association for Computational Linguistics, 2017.

This volume contains the papers presented at BSNLP-2017: the Sixth Workshop on Balto-Slavic Natural Language Processing. The Workshop is organized by SIGSLAV—Special Interest Group on NLP in Slavic Languages of the Association for Computational Linguistics. The Workshops have been convening for over a decade, with a clear vision and purpose. On one hand, the languages from ...

Added: June 13, 2017

Intelligent Systems and Applications

Cham: Springer, 2019.

Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year. The conference focus on areas of intelligent systems and artificial intelligence (AI) and ...

Added: August 29, 2018

Методика автоматического выделения структурных единиц в предложениях на русском языке

Болховитянов А. В., Chepovskiy A., Информационные технологии 2012 № 2 С. 25–29

In this paper, we propose two mathematical models intended for analyzing the russian sentence to detect noun phrases and participial clauses. Algorithm for participial clause identification is based on the concept of syntactic relation between verb and dependent syntactic units in the russian language. Considered algorithms designed on the basis of the proposed models can ...

Added: September 6, 2012

Извлечение сценарной информации из текстов. Часть 1. Постановка задачи и обзор методов

Суворова М. И., Кобозева М. В., Toldova S. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 17–26

В статье обсуждается важность автоматического сценарного анализа для понимания текстов на естественном языке. Дан широкий обзор методов и подходов к описанию и извлечению сценариев. Рассмотрены теоретические подходы к формализации сценариев. Приведен список задач, для решения которых используется информация о сценарной структуре текста. Представлены популярные подходы к автоматическому извлечению сценариев из текстов и методы оценки их ...

Added: April 22, 2020

The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics

Bakarov A., PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '18) 2018 P. 153–161

Swivel (Submatrix-WIse Vector Embedding Learner) is a distributional semantic model based on counting point-wise mutual information values, capable of capturing word-context co-occurrences in the PMI matrix that were not noted in the training corpus. This model outperforms mainstream word embedding training algorithms such as Continuous Bag-of-Words, GloVe and Skip-Gram in word similarity and word analogy ...

Added: December 12, 2020

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings

Springer, 2021.

This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic. The 27 full papers and 4 short papers presented ...

Added: October 7, 2020

Intelligent Computing: SAI 2020: Volume 3

Cham: Springer, 2020.

This book focuses on the core areas of computing and their applications in the real world. Presenting papers from the Computing Conference 2020 covers a diverse range of research areas, describing various detailed techniques that have been developed and implemented. The Computing Conference 2020, which provided a venue for academic and industry practitioners to share new ...

Added: July 7, 2020

Fine-Tuning Transformers: Vocabulary Transfer

Samenko I., Tikhonov A., Kozlovskii B. et al., / Series Computer Science "arxiv.org". 2021.

Transformers are responsible for the vast majority of recent advances in natural language processing. The majority of practical natural language processing applications of these models is typically enabled through transfer learning. This paper studies if corpus-specific tokenization used for fine-tuning improves the resulting performance of the model. Through a series of experiments, we demonstrate that ...

Added: January 17, 2022

Language Exercise Generation: Emulating Cambridge Open Cloze

Malafeev A., International Journal of Conceptual Structures and Smart Applications (IJCSSA) 2014 Vol. 2 No. 2 P. 20–35

This article presents an approach to the automatic generation of open cloze exercises based on arbitrary English text. The exercise format is similar to the open cloze test used in Cambridge English certificate exams (FCE, CAE, CPE). The presented method also makes it possible to adjust the difficulty of the resulting exercises to better suit ...

Added: November 29, 2014

Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping University Electronic Press, 2015.

The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) – NLP4CALL – is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. ...

Added: May 31, 2015

8th Russian Summer School in Information Retrieval (RuSSIR 2014)

Braslavski P., Karpov Nikolay, Worring M. et al., ACM SIGIR Forum 2014 Vol. 48 No. 2 P. 105–110

The 8th Russian Summer School in Information Retrieval (RuSSIR 2014) was held on August 18-22, 2014 in Nizhniy Novgorod, Russia.1 The school was co-organized by the National Research University Higher School of Economics2 and the Russian Information Retrieval Evaluation Seminar (ROMIP) ...

Added: August 22, 2015

Information Extraction Based on Deep Syntactic-Semantic Analysis

Skorinkin D.A., Budnikov E. A., Stepanova M. E. et al., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 721–733

This paper presents a rule-based approach to Information Extraction (IE) task within FactRuEval-2016 competition. Our system is based on ABBYY Compreno Technology. The technology uses the results of deep syntactic-semantic analysis, which leads to significant reduction of the number of necessary rules and makes them laconic. The evaluation was conducted on FactRuEval dataset. FactRuEval is ...

Added: August 28, 2016

Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers

Switzerland: Springer, 2015.

This book constitutes the proceedings of the Fourth International Conference on Analysis of Images, Social Networks and Texts, AIST 2015, held in Yekaterinburg, Russia, in April 2015. The 24 full and 8 short papers were carefully reviewed and selected from 140 submissions. The papers are organized in topical sections on analysis of images and videos; ...

Added: October 12, 2015

Информационные модели в задачах обработки текстов на естественных языках. Второе издание, переработанное

Chepovskiy A., М.: Национальный открытый университет «ИНТУИТ», 2015.

В монографии рассмотрены различные математические модели для решения практических задач обработки текстов на естественных языках. Предлагаются решения проблем, возникающих при организации индексации и последующего поиска данных. Методы компьютерной лингвистики применяются для прикладных исследований. Предназначена для разработчиков информационных систем, специалистов в области компьютерной лингвистики. ...

Added: May 23, 2015

Methods for automatic term recognition in domain-specific text collections: A survey

Denis Turdakov, Astrakhantsev N., Fedorenko D., Programming and Computer Software 2015 Vol. 41 No. 6 P. 336–349

Applications related to domain specific text processing often use glossaries and ontologies, and the main step of such resource construction is term recognition. This paper presents a survey of existing definitions of the term and its linguistic features, formulates the task definition for term recognition, and analyzes presently-available methods for automatic term recognition, such as ...

Added: August 26, 2016

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)

Association for Computational Linguistics, 2021.

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ...

Added: August 31, 2021

Breaking Sticks and Ambiguities with Adaptive Skip-gram

Bartunov S., Кондрашкин Д. А., Osokin A. et al., / Series arXiv:1502.07257 "Computation and language". 2015.

Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to ...

Added: November 5, 2015

TALN-RECITAL 2014 Workshop TALAf 2014 : Traitement Automatique des Langues Africaines (TALAf 2014: African Language Processing)

Marseille: Association pour le Traitement Automatique des Langues, 2014.

Dans la suite du premier atelier TALAf qui s'est tenu le 8 juin 2012 à Grenoble, lors de la conférence JEP-TALN-RECITAL 2012 (voir les actes : http://aclweb.org/anthology//W/W12/#1300), nous proposons une nouvelle édition de cet atelier lors de la conférence TALN 2014 le premier juillet à Marseille. Cette deuxième édition montre l'intérêt d'un atelier francophone sur le traitement ...

Added: March 26, 2015

Проблемы обработки естественного языка в диалоговых системах

Klyshinskiy E., Жеребцова Ю., Чижик А., Системный администратор 2019 № 10 С. 82–91

Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long ...

Added: October 26, 2019

Knowledge Engineering and the Semantic Web. 4th Conference, KESW 2013, St. Petersburg, Russia, October 7-9, 2013. Proceedings

Berlin: Springer, 2013.

This book constitutes the refereed proceedings of the 4th Conference on Knowledge Engineering and the Semantic Web, KESW 2013, held in St. Petersburg, Russia, in October 2013. The 18 revised full papers presented together with 7 short system descriptions were carefully reviewed and selected from 52 submissions. The papers address research issues related to knowledge ...

Added: October 14, 2014

Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information Science

Switzerland: Springer, 2017.

This book constitutes the proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts, AIST 2016, held in Yekaterinburg, Russia, in April 2016. The 23 full papers, 7 short papers, and 3 industrial papers were carefully reviewed and selected from 142 submissions. The papers are organized in topical sections on machine ...

Added: October 19, 2016

Proceedings of the Artificial Intelligence and Natural Language AINL FRUCT 2016 Conference, Saint-Petersburg, Russia, 10-12 November 2016

FRUCT Oy, 2016.

Proceeding of the AINL FRUCT: Artificial Intelligence and Natural Language Conference 2016 ...

Added: October 19, 2016

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Association for Computational Linguistics, 2019.

The 4th Workshop on Representation Learning for NLP (RepL4NLP) will be hosted by ACL 2019 and held on 2 August 2019. The workshop is being organised by Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Alexis Conneau, Johannes Welbl, Xian Ren and Marek Rei; and advised by Kyunghyun Cho, Edward Grefenstette, Karl Moritz ...

Added: November 1, 2019