LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation

Davletov A.; Nikolay Arefyev; Gordeev D.; Rey A.

doi:10.18653/v1/2021.semeval-1.103

Publications

?

LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation

P. 780–786.

Davletov A., Nikolay Arefyev, Gordeev D., Rey A.

This paper presents our approaches to SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation task. The first approach attempted to reformulate the task as a question answering problem, while the second one framed it as a binary classification problem. Our best system, which is an ensemble of XLM-R based binary classifiers trained with data augmentation, is among the 3 best-performing systems for Russian, French and Arabic in the multilingual subtask. In the post-evaluation period, we experimented with batch normalization, subword pooling and target word occurrence aggregation methods, resulting in further performance improvements.

Language: English

DOI

Keywords: natural language processing computational lexical semantics

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Association for Computational Linguistics, 2021.

SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task

Razzhigaev A., Nikolay Arefyev, Panchenko A., , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).: Association for Computational Linguistics, 2021. P. 157–162.

In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple ...

Added: September 23, 2021

Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts

Panicheva P., Bogolyubova O., Ledovaya Y., , in: Proceedings of the Artificial Intelligence and Natural Language AINL FRUCT 2016 Conference, Saint-Petersburg, Russia, 10-12 November 2016.: FRUCT Oy, 2016. P. 72–79.

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The presented project is intended to make use of growing amounts or textual data in social networks in the Russian language, In order to find linguistic correlates of the Dark Triad personality traits, comprising non-clinical Narcissism, Machiavellianism and Psychopathy. The background for the ...

Added: February 18, 2019

Метод семантичского поиска специалистов с определенным набором компетенций

Zakhlebin I. V., В кн.: Электронный бизнес. Управление интернет-проектами. Инновации: Сборник трудов участников студенческой научно-практической конференции, Москва, 12-14 марта 2013 г.: М.: НИУ ВШЭ, 2014. С. 88–91.

The report deals with the methodology of building a system to perform search for specialists satisfying a defined set of competencies. The proposed search method is based on natural language texts analysis. ...

Added: July 11, 2015

Teaching a Massive Open Online Course on Natural Language Processing

Artemova E., Apishev M., Sarkisyan V. et al., , in: Proceedings of the Fifth Workshop on Teaching NLP.: Association for Computational Linguistics, 2021. Ch. 2 P. 13–27.

Added: September 27, 2021

Automatic Morphemic Analysis of Russian Words

Мальтина Л. П., Malafeev A., , in: Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018.: Aachen: CEUR Workshop Proceedings, 2018. Ch. 9 P. 85–94.

The paper considers the task of the morphemic analysis of Russian words and compares the efficiency of several proposed models. These models can be divided into three groups: derivational and inflectional rule-based, proba- bilistic, and hybrid models. The latter achieved state-of-the-art results of 0.848 F-score on a test set of 500 Russian words. The models ...

Added: February 15, 2019

A System for Knowledge Discovery in Big Dynamical Text Collections

Kuznetsov S., Neznanov A., Poelmans J., , in: Proceedings, Workshop “What can FCA do for Artificial Intelligence?” of the ECAI 2012 conference.: M.: CEUR Workshop Proceedings, 2012. Ch. 12 P. 81–87.

Software system Cordiet-FCA is presented, which is designed for knowledge discovery in big dynamic data collections, including texts in natural language. Cordiet-FCA allows one to compose ontology-controlled queries and outputs concept lattice, implication bases, association rules, and other useful concept-based artifacts. Efficient algorithms for data preprocessing, text processing, and visualization of results are discussed. Examples ...

Added: January 30, 2013

The Advantages of Human Evaluation of Sociomedical Question Answering Systems

Фирсанова В. И., International Journal of Open Information Technologies 2021 Vol. 9 No. 12 P. 53–59

The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is based on the data from several natural language processing experiments conducted with a question answering dataset for inclusion of people with autism spectrum disorder and state-of-the-art ...

Added: September 25, 2023

Cross-Domain Limitations of Neural Models on Biomedical Relation Classification

Alimova I., Tutubalina E., Nikolenko S. I., IEEE Access 2022 Vol. 10 P. 1432–1439

Relation extraction (RE) aims to extract relational facts from plain text, which is essential to the biomedical research field with the rapid growth of biomedical literature and generally large volumes of biomedicine-related text coming from various sources. Numerous annotated corpora and state-of-the-art models have been introduced in the past five years. However, there are no ...

Added: April 10, 2023

Specialized Knowledge Mediation: Ontological & Metaphorical Modelling

Isaeva E., Manerko L., Manzhula O. et al., Springer, 2022.

This book provides an integrated approach to cognitive-linguistic mediation, with aims toward the efficiency of knowledge transfer and acquisition. Problems are approached through the prism of cognitive modelling, and mapped to such fields as intercultural and interdisciplinary communication, and second language teaching. The novelty lies in the synergies between linguistics, cognitive science, artificial intelligence, culture, ...

Added: March 14, 2022

The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews

Tutubalina E., Алимова И. С., Мифтахутдинов З. et al., Bioinformatics 2021 Vol. 37 No. 2 P. 243–249

Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient’s health conditions and adverse drug reactions reported on the ...

Added: January 13, 2021

Methods for automatic term recognition in domain-specific text collections: A survey

Denis Turdakov, Astrakhantsev N., Fedorenko D., Programming and Computer Software 2015 Vol. 41 No. 6 P. 336–349

Applications related to domain specific text processing often use glossaries and ontologies, and the main step of such resource construction is term recognition. This paper presents a survey of existing definitions of the term and its linguistic features, formulates the task definition for term recognition, and analyzes presently-available methods for automatic term recognition, such as ...

Added: August 26, 2016

Building a Dictionary-Based Lemmatizer for Old Irish

Dereza O., , in: Actes de la conférence conjointe JEP-TALN-RECITALVol. 6: Celtic Language Technology Workshop.: P.: [б.и.], 2016. P. 12–17.

This paper explores the problem of developing NLP tools for morphologically rich and orthographically inconsistent classical languages. It is a case study of building a lemmatizer for Old Irish using only a dictionary and an unlabeled corpus as sources of data. At the current stage, the lemmatizer shows 76.31% average recall score on a corpus ...

Added: October 5, 2017

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings

Springer, 2021.

This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic. The 27 full papers and 4 short papers presented ...

Added: October 7, 2020

Multimodal Discourse Trees in Forensic Linguistics

Galitsky B., Ilvovsky D., Goncharova E., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023.

We extend the concept of a discourse tree (DT) in the discourse representation of text towards data of various forms and natures. The communicative DT to include speech act theory, extended DT to ascend to the level of multiple documents, entity DT to track how discourse covers various entities were defined previously in computational linguistics, we now proceed ...

Added: November 10, 2023

Exploration of register-dependent lexical semantics using word embeddings

Kutuzov A. B., Kuzmenko E., Marakasova A., , in: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH).: Osaka: [б.и.], 2016. P. 26–34.

We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm. Models trained on register-specific subcorpora of the BNC corpus are employed to compare lists of nearest associates for particular words and draw conclusions about their semantic shifts depending on register in which they ...

Added: November 12, 2016

SyntaxNet Errors from the Linguistic Point of View

Durandin O., Malafeev A., Zolotykh N., , in: Analysis of Images, Social Networks and Texts. 6th International Conference, 2017, Revised Selected PapersVol. 10716.: Cham: Springer, 2018. Ch. 4 P. 34–46.

The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering ...

Added: December 1, 2017

Using Probability Distribution over Classes in Automatically Obtained Training Corpora

Durandin O., Hilal N., Strebkov D. et al., , in: Proceedings of the ISMW-FRUCT 2016.: [б.и.], 2016. P. 90–93.

The paper contains a take on the classification problem variation featuring class noise where each object in the training set is associated with a probability distribution over the class label set instead of a particular class label. That type of task was illustrated on the complex natural language processing problem – automatic Arabic dialect classification. ...

Added: January 17, 2017

Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science

Springer, 2021.

This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...

Added: October 28, 2021

Employing Wikipedia data for coreference resolution in Russian

Azerkovich I., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930.: Switzerland: Springer, 2018. P. 107–112.

Semantic information has been deemed a valuable resource for solving the task of coreference resolution by many researchers. Unfortunately, not much has been done in the direction of using this data when working with Russian data. This work describes the first step of a research, attempting to create a coreference resolution system for Russian based on semantic data, concerned with ...

Added: September 5, 2018

A new formal approach to semantic parsing of instructions and to file manager design

Razorenov A., Fomichov V. A., , in: Database and Expert Systems Applications. 27th International Conference, DEXA 2016, Porto, Portugal, September 5-8, 2016, ProceedingsЧ. I. Т. 9827.: Дордрехт, Лондон, Хайдельберг, Нью-Йорк, Хам: Springer, 2016. P. 416–430.

During roughly the last seven years, an increase of interest in semantic parsing of instructions in natural language (NL) could be observed. The principal applications of developed algorithms are NL-interfaces for interaction with robots and the personages of videogames, navigation in virtual space, and for developing programs by means of NL. However, the known algorithms ...

Added: October 18, 2016

Undertanding Meaning and Knowlwdge Representation: from Theoretical and Cognitive Linguistics to Natural Language Processing

Newcastle upon Tyne: Cambridge Scholars Publishing, 2015.

Today, there is a need to develop natural language processing (NLP) systems from deeper linguistic approaches. Although there are many NLP applications which can work without taking into account any linguistic theory, this type of system can only be described as “deceptively intelligent”. On the other hand, however, those computer programs requiring some language comprehension ...

Added: January 24, 2016

Информационные модели в задачах обработки текстов на естественных языках. Второе издание, переработанное

Chepovskiy A., М.: Национальный открытый университет «ИНТУИТ», 2015.

В монографии рассмотрены различные математические модели для решения практических задач обработки текстов на естественных языках. Предлагаются решения проблем, возникающих при организации индексации и последующего поиска данных. Методы компьютерной лингвистики применяются для прикладных исследований. Предназначена для разработчиков информационных систем, специалистов в области компьютерной лингвистики. ...

Added: May 23, 2015

Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Lisbon: SciTePress, 2021.

Knowledge Engineering (KE) refers to all technical, scientific and social aspects involved in building, maintaining and using knowledge-based systems. KE is a multidisciplinary field, bringing in concepts and methods from several computer science domains such as artificial intelligence, databases, expert systems, decision support systems and information systems. From the software development point of view, KE ...

Added: October 2, 2021

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)

Association for Computational Linguistics, 2021.

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ...

Added: August 31, 2021