SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task

Razzhigaev A.; Nikolay Arefyev; Panchenko A.

doi:10.18653/v1/2021.semeval-1.16

Publications

?

SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task

P. 157–162.

Razzhigaev A., Nikolay Arefyev, Panchenko A.

In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple yet effective approach based on machine translation and back translation of the lexical units to the original language used in the context of this shared task. In our experiments, we used a neural system based on the XLM-R, a pre-trained transformer-based masked language model, as a baseline. We show the effectiveness of the proposed approach as it allows to substantially improve the performance of this strong neural baseline model. In addition, in this study, we present multiple types of the XLM-R based classifier, experimenting with various ways of mixing information from the first and second occurrences of the target word in two samples.

Language: English

DOI

Keywords: natural language processing computational lexical semantics

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Association for Computational Linguistics, 2021.

LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation

Davletov A., Nikolay Arefyev, Gordeev D. et al., , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). Association for Computational Linguistics, 2021. P. 780–786.

This paper presents our approaches to SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation task. The first approach attempted to reformulate the task as a question answering problem, while the second one framed it as a binary classification problem. Our best system, which is an ensemble of XLM-R based binary classifiers trained with data augmentation, ...

Added: September 23, 2021

Exploration of register-dependent lexical semantics using word embeddings

Kutuzov A. B., Kuzmenko E., Marakasova A., , in: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). Osaka: [б.и.], 2016. P. 26–34.

We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm. Models trained on register-specific subcorpora of the BNC corpus are employed to compare lists of nearest associates for particular words and draw conclusions about their semantic shifts depending on register in which they ...

Added: November 12, 2016

Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Association for Computational Linguistics, 2019.

Added: September 15, 2020

Использование принципов «регуляторной гильотины» и методов вычислительного права для анализа требований к качеству высшего образования

Knyaginina N., Jankiewicz S., Tikhonov E., Вопросы государственного и муниципального управления 2022 № 1 С. 78–100

Now Russia is undergoing a reform of the control and supervisory activity of the “regulatory guillotine”, which is designed to signifi cantly reduce the number of mandatory requirements in the legislation, leaving only those that are necessary and should be controlled among them. In the presented article, the principles of this reform are applied to the Federal State Educational Standards (FSES). Russian legislation understands the quality of education ...

Added: April 6, 2022

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Cross-Domain Limitations of Neural Models on Biomedical Relation Classification

Alimova I., Tutubalina E., Nikolenko S. I., IEEE Access 2022 Vol. 10 P. 1432–1439

Relation extraction (RE) aims to extract relational facts from plain text, which is essential to the biomedical research field with the rapid growth of biomedical literature and generally large volumes of biomedicine-related text coming from various sources. Numerous annotated corpora and state-of-the-art models have been introduced in the past five years. However, there are no ...

Added: April 10, 2023

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

P.: European Language Resources Association (ELRA), 2018.

Book of abstracts from the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ...

Added: May 5, 2018

Lost in Conversation: A Conversational Agent Based on the Transformer and Transfer Learning

Golovanov S., Tselousov A., Rauf Kurbanov et al., , in: The NeurIPS '18 Competition: From Machine Learning to Intelligent Conversations. Springer, 2020. P. 295–315.

Added: February 20, 2021

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019.

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Investor sentiment and the NFT hype index: to buy or not to buy?

Baklanova V., Kurkin A., Teplova T., China Finance Review International 2024 Vol. 14 No. 3 P. 522–548

Purpose – The primary objective of this research is to provide a precise interpretation of the constructed machine learning model and produce definitive summaries that can evaluate the influence of investor sentiment on the overall sales of non-fungible token (NFT) assets. To achieve this objective, the NFT hype index was constructed as well as several approaches of ...

Added: December 10, 2023

Decomposing Textual Information For Style Transfer

Ivan P. Yamshchikov, Shibaev V., Nagaev A. et al., , in: Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, 2019. P. 128–137.

This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms ...

Added: January 7, 2021

Knowledge Engineering and Semantic Web

Switzerland: Springer, 2015.

This book constitutes the refereed proceedings of the 6th Conference on Knowledge Engineering and the Semantic Web, KESW 2015, held in Moscow, Russia, in September/October 2015. The 17 revised full papers presented together with 6 short system descriptions were carefully reviewed and selected from 35 submissions. The papers address research issues related to semantic web, ...

Added: September 16, 2015

Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science

Springer, 2021.

This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...

Added: October 28, 2021

Проект Chekhov Digital: задачи и проблемы реализации семантической разметки текстов (на примере рассказа А. П. Чехова «Смерть чиновника»)

Северина Е. М., Ларионова М. Ч., Litera 2023 № 10 С. 211–222

The article considers a model of preparation of machine-readable (semantic) markup of texts for the Chekhov Digital project on the example of philological interpretation of individual significant elements of A. P. Chekhov's story "Death of an Official" and presentation of this information explicitly based on the standards of digital publication Text Encoding Initiative (TEI/XML). Based ...

Added: January 12, 2024

SyntaxNet Errors from the Linguistic Point of View

Durandin O., Malafeev A., Zolotykh N., , in: Analysis of Images, Social Networks and Texts. 6th International Conference, 2017, Revised Selected PapersVol. 10716. Cham: Springer, 2018. Ch. 4 P. 34–46.

The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering ...

Added: December 1, 2017

Style transfer in NLP: a framework and multilingual analysis with Friends TV series

Tikhonova M., Elina Telesheva, Mirzoev S. et al., , in: 2021 International Conference Engineering and Telecommunication (En&T). IEEE, 2022. P. 1–6.

Style transfer is an important and a rapidly developing of Natural Language Processing. This days more and more methods and models are proposed which allow us to generate text in predefined style. In this paper we propose a framework for style transfer of “Friends” TV series. The trained models are able to mimic one of ...

Added: May 21, 2022

Метод семантичского поиска специалистов с определенным набором компетенций

Zakhlebin I. V., В кн.: Электронный бизнес. Управление интернет-проектами. Инновации: Сборник трудов участников студенческой научно-практической конференции, Москва, 12-14 марта 2013 г. М.: НИУ ВШЭ, 2014. С. 88–91.

The report deals with the methodology of building a system to perform search for specialists satisfying a defined set of competencies. The proposed search method is based on natural language texts analysis. ...

Added: July 11, 2015

Использование информационной теории восприятия речи для анализа качества речи

Karpov N., В кн.: Современные проблемы информатизации в анализе и синтезе технологических и программно-телекоммуникационных систем: Сборник трудовВып. 17. Воронеж: Научная книга, 2012. С. 264–266.

Added: November 7, 2012

LIORI at SemEval-2021 Task 8: Ask Transformer for measurements

Davletov A., Gordeev D., Nikolay Arefyev et al., , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). Association for Computational Linguistics, 2021. P. 1249–1254.

This work describes our approach for subtasks of SemEval-2021 Task 8: MeasEval: Counts and Measurements which took the official first place in the competition. To solve all subtasks we use multi-task learning in a question-answering-like manner. We also use learnable scalar weights to weight subtasks’ contribution to the final loss in multi-task training. We fine-tune ...

Added: September 23, 2021

The Advantages of Human Evaluation of Sociomedical Question Answering Systems

Фирсанова В. И., International Journal of Open Information Technologies 2021 Vol. 9 No. 12 P. 53–59

The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is based on the data from several natural language processing experiments conducted with a question answering dataset for inclusion of people with autism spectrum disorder and state-of-the-art ...

Added: September 25, 2023

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin: Springer, 2014.

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Multiple features for clinical relation extraction: A machine learning approach

Alimova l., Tutubalina E., Journal of Biomedical Informatics 2020 Vol. 103 P. 1–9

Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding ...

Added: October 28, 2020

Assessment of Dendritic Cell Therapy Effectiveness Based on the Feature Extraction from Scientific Publications

Luparov A., Panov A. I., Suvorov R. et al., , in: Proceedings of ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and MethodsVol. 2. SciTePress, 2015. P. 270–276.

Dendritic cells (DCs) vaccination is a promising way to contend cancer metastases especially in the case of immunogenic tumors. Unfortunately, it is only rarely possible to achieve a satisfactory clinical outcome in the majority of patients treated with a particular DC vaccine. Apparently, DC vaccination can be successful with certain combinations of features of the ...

Added: November 20, 2015

The Language of Comment in Social Networks: an Overview of Morphological and Syntactic Features

Karpov I., Крылова Т. В., Timoshenko S., Scando-Slavica 2022 P. 1–20

In this paper we describe the difference between informal comments, posted on social networks, and internet journalistic style texts, which tend to be written in a Codified Literary Russian. We performed a quantitative analysis of more than0 graphic, morphological, syntactic features, and supplied statistically significant features with the linguistic interpretation. The article concluded that the ...

Added: October 31, 2021