Rethinking Crowd Sourcing for Semantic Similarity
Solomon S., Cohn A., Rosenblum H., Hershkovitz C., Yamshchikov I. P.
Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity.
16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I ISBN: 978-3-319-18110-3 (Print) 978-3-319-18111-0 (Online) ...
Added: April 23, 2015
CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016
Aachen : CEUR Workshop Proceedings, 2017
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With ...
Added: June 25, 2017
Информационные модели в задачах обработки текстов на естественных языках. Второе издание, переработанное
, М. : Национальный открытый университет «ИНТУИТ», 2015
В монографии рассмотрены различные математические модели для решения практических задач обработки текстов на естественных языках. Предлагаются решения проблем, возникающих при организации индексации и последующего поиска данных. Методы компьютерной лингвистики применяются для прикладных исследований. Предназначена для разработчиков информационных систем, специалистов в области компьютерной лингвистики. ...
Added: May 23, 2015
M. : Russian State University for the Humanitie, 2019
The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...
Added: June 12, 2019
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), co-located with NAACL 2016, San Diego, California, June 16, 2016
Stroudsburg, PA : Association for Computational Linguistics, 2016
Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of papers reporting results on the MUC/ACE/OntoNotes corpora. Given ...
Added: December 7, 2016
Cham : Springer, 2019
Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year. The conference focus on areas of intelligent systems and artificial intelligence (AI) and ...
Added: August 29, 2018
Switzerland : Springer, 2015
This book constitutes the refereed proceedings of the 6th Conference on Knowledge Engineering and the Semantic Web, KESW 2015, held in Moscow, Russia, in September/October 2015. The 17 revised full papers presented together with 6 short system descriptions were carefully reviewed and selected from 35 submissions. The papers address research issues related to semantic web, ...
Added: September 16, 2015
Data Analytics and Management in Data Intensive Domains. 23rd International Conference, DAMDID/RCDL 2021, Moscow, Russia, October 26–29, 2021, Revised Selected Papers
“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...
Added: August 30, 2021
, , et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5-21
The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...
Added: December 11, 2019
, , et al., Programming and Computer Software 2014 Vol. 40 No. 5 P. 288-295
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and ...
Added: November 26, 2017
, , et al., Breaking Sticks and Ambiguities with Adaptive Skip-gram / . 2015.
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to ...
Added: November 5, 2015
, Научно-технический вестник Поволжья 2014 № 6 С. 197-199
The paper describes an approach to the implementation of a system that automates the processing of design documentation. Documentation analysis is based on natural language processing, specially developed object-oriented language and ontological resources. As a result, the system highlights linkages between the documents and its semantic indexing. Users get an opportunity of easily navigate through ...
Added: December 13, 2014
Cham : Springer, 2020
This book focuses on the core areas of computing and their applications in the real world. Presenting papers from the Computing Conference 2020 covers a diverse range of research areas, describing various detailed techniques that have been developed and implemented. The Computing Conference 2020, which provided a venue for academic and industry practitioners to share new ...
Added: July 7, 2020
Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014)
Ekaterinburg : CEUR Workshop Proceedings, 2014
AIST'2014 is an international data science conference on Analysis of Images, Social Networks, and Texts. Traditionally, the conference is held annually in Yekaterinburg, Russia. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science. LIST OF TOPICS (NON EXHAUSTIVE) Applications of Data Mining and Machine ...
Added: August 28, 2014
Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers
Berlin : Springer, 2014
This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...
Added: November 13, 2014
, , Chekhov's Gun Recognition / Cornell University. Series Computer Science "arxiv.org". 2021.
Chekhov's gun is a dramatic principle stating that every element in a story must be necessary, and irrelevant elements should be removed. This paper presents a new natural language processing task — Chekhov's gun recognition or (CGR) — recognition of entities that are pivotal for the development of the plot. Though similar to classical Named Entity Recognition ...
Added: December 3, 2021
, , Научно-техническая информация. Серия 2: Информационные процессы и системы 2021 № 1
The paper proposes algorithms that perform automatic morphemic analysis of words and methods of distributed representations of words that indirectly use information about the morphemic composition through the averaging of vectors of same-root words. Morphemic analysis models for the Russian language are evaluated on samples of common and rare words. Several methods are proposed for obtaining ...
Added: November 9, 2020
, , , Journal of Physics: Conference Series 2019 Vol. 1405(1) No. DOI: 10.1088/1742-6596/1405/1/012011б
In the paper the case is studied then semiotic signs can be represented as language constructs in the same language as the text for the interpretation. The goal is to obtain estimates of the depth of interpretability with the respect to each of the signs by finding the projections of the narrative on these language ...
Added: June 28, 2021
, , , Programming and Computer Software 2015 Vol. 41 No. 6 P. 336-349
Applications related to domain specific text processing often use glossaries and ontologies, and the main step of such resource construction is term recognition. This paper presents a survey of existing definitions of the term and its linguistic features, formulates the task definition for term recognition, and analyzes presently-available methods for automatic term recognition, such as ...
Added: August 26, 2016
, , et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421-438
he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...
Added: November 6, 2017
2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)
Association for Computational Linguistics, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ...
Added: August 31, 2021
The Applications of Sentiment Analysis for Russian Language Texts: Current Challenges and Future Perspectives
, IEEE Access 2020 Vol. 8 P. 110693-110719
Sentiment analysis has become a powerful tool in processing and analysing expressed opinions on a large scale. While the application of sentiment analysis on English-language content has been widely examined, the applications on the Russian language remains not as well-studied. In this survey, we comprehensively reviewed the applications of sentiment analysis of Russian-language content and ...
Added: June 24, 2020
TALN-RECITAL 2014 Workshop TALAf 2014 : Traitement Automatique des Langues Africaines (TALAf 2014: African Language Processing)
Marseille : Association pour le Traitement Automatique des Langues, 2014
Dans la suite du premier atelier TALAf qui s'est tenu le 8 juin 2012 à Grenoble, lors de la conférence JEP-TALN-RECITAL 2012 (voir les actes : http://aclweb.org/anthology//W/W12/#1300), nous proposons une nouvelle édition de cet atelier lors de la conférence TALN 2014 le premier juillet à Marseille. Cette deuxième édition montre l'intérêt d'un atelier francophone sur le traitement ...
Added: March 26, 2015