Word Sense Disambiguation for Russian Verbs Using Semantic Vectors and Dictionary Entries

A. Lopukhina; Лопухин К. А.

?

Word Sense Disambiguation for Russian Verbs Using Semantic Vectors and Dictionary Entries

Компьютерная лингвистика и интеллектуальные технологии. 2016. No. 15. P. 393–405.

Lopukhina A., Лопухин К. А.

Word sense disambiguation (WSD) methods are useful for many NLP tasks that require semantic interpretation of input. Furthermore, such methods can help estimate word sense frequencies in different corpora, which is important for lexicographic studies and language learning resources. Although previous research on Russian polysemous verbs disambiguation established some important and interesting results, it was mostly focused on reducing ambiguity or determining the most frequent sense, but not on evaluating WSD accuracy. To the best of our knowledge, there is no comprehensively evaluated method that can perform semi-supervised word sense disambiguation for Russian verbs. In this paper we present a WSD method for verbs that is able to reach an average disambiguation accuracy of 75% using only available linguistic resources: examples and collocations from the Active Dictionary of Russian and large unlabeled corpora. We evaluate the method on contexts sampled from the web-based corpus RuTenTen11 for 10 verbs with 100 contexts for each verb. We compare different variations of the method and analyze its limitations. Method’s implementation and labeled contexts are available online.

Research target: Computer Science Philology and Linguistics

Priority areas: humanitarian

Language: English

Full text

Text on another site

Keywords: polysemy word2vec word sense disambiguation sense frequency semantic vectors

Word Sense Frequency of Similar Polysemous Words in Different Languages

Iomdin B., Lopukhina A., Лопухин К. А. et al., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 214–225

When words have several senses, it is important to describe them properly in dictionary (a lexicographic task) and to be able to distinguish them in a given context (a computational linguistics task, WSD). Different senses normally have different frequencies in corpora. We introduced several techniques for determining sense frequency based on dictionary entries matched with ...

Added: October 11, 2016

RUSSE2018: a Shared Task on Word Sense Induction for the Russian Language

Panchenko A., Lopukhina A., Ustalov D. et al., Компьютерная лингвистика и интеллектуальные технологии 2018 No. 17 P. 547–564

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...

Added: June 7, 2018

Метод автоматического создания лексико-грамматических упражнений в формате wordbank cloze

Malafeev A., Иностранные языки в высшей школе 2015 № 2 (33) С. 88–95

Language exercises are widely used in teaching foreign languages; yet, manually creating exercises is labor-intensive and time-consuming. This paper describes a method for automatically generating EFL wordbank cloze exercises. These are generated from arbitrary passages in English, which is an important advantage in terms of learner motivation; indeed, the content of the exercises can be ...

Added: September 4, 2015

Innovative Use of NLP for Building Educational Applications

Stroudsburg, PA: Association for Computational Linguistics, 2019

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications ...

Added: October 5, 2020

Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts

Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69–89

The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...

Added: June 24, 2021

Review of the book: Wilken, Rowan: Teletechnologies, Place, and Community. New York, Routledge, 2011 // Digital Icons: Studies in Russian, Eurasian and Central European New Media, No 9 (2013): 129-133.

Gusejnov G., Digital Icons: Studies in Russian, Eurasian and Central European New Media 2013 No. 9 P. 129–133

In his book, Rowan Wilken, lecturer at the University of Swinburne, Australia, makes an attempt at providing a theoretical frame for a three-dimensional problem: the relation between new technologies, communities and places. His main goal is to sculpt an understanding of the relationship between place and community, both of which are transcended by what he ...

Added: March 24, 2014

Insights into the web based English learning projects

Frolova N., Frolov E. S., The Kazakh-American Free University Academic Journal 2017 P. 179–184

The article reflects the practical experience of enhancing the process of Academic English Writing teaching to undergraduate students by means of web tools. Along with theoretical analysis of the integration scheme of blended learning into the curriculum the article features empirical survey to confirm the efficiency of the project. The article contains a detailed description ...

Added: June 5, 2018

Учебно-методическое пособие English for Specific Purposes: Computer Security

Baranovskaya T., Klepko E. Y., Резниченко Е. М. et al., М.: Издательский дом ГУ-ВШЭ, 2009

Данное учебное пособие предназначено для студентов 3 курса факультета бизнес-информатики и соответствует требованиям программы подготовки бакалавров по направлению 080700.62 «Бизнес-информатика». Книга представляет собой первую часть курса и рассчитана на работу в первом и втором модулях. На третьем курсе программой предусмотрено изучение профессионально-ориентированного английского (English for specific purposes), что обусловило выбор тематики – компьютерная безопасность. Пособие ...

Added: May 14, 2013

Материалы 21-й Международной конференции по компьютерной лингвистике "Диалог"

М.: Изд-во РГГУ, 2015

Сборник содержит труды 21-й Международной конференции по компьютерной лингвистике. ...

Added: May 20, 2015

The 26th International Conference on Computational Linguistics (COLING 2016)

[б.и.], 2016

Added: December 1, 2016

Тринадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2012 (16-20 октября 2012 г., г. Белгород, Россия). Том 2

Белгород: Белгородский государственный технологический университет им. В.Г. Шухова, 2012

Важность проведения очередной тринадцатой национальной конференции по искусственному интеллекту (КИИ-2012) обусловлена необходимостью обмена научной информацией и последними достижениями в данной области. В обсуждении фундаментальных теоретических и прикладных проблем, возникающих при создании интеллектуальных систем, принимают участие ведущие ученые и специалисты из академических институтов, научных и промышленных организаций, а также вузов России, стран ближнего и дальнего зарубежья. ...

Added: November 13, 2012

Comparing two “thermometers”: Impact factors of 20 leading economic journals according to Journal Citation Reports and Scopus

Pislyakov V., Scientometrics 2009 Vol. 79 No. 3 P. 541–550

Impact factors for 20 journals ranked first by Journal Citation Reports (JCR) were compared with the same indicator calculated on the basis of citation data obtained from Scopus database. A significant discrepancy was observed as Scopus, though results differed from title to title, found in general more citations than listed in JCR. This also affected ...

Added: January 25, 2013

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)

М.: Изд-во РГГУ, 2020

Papers from the Annual International Conference “Dialogue” (2020). Issue 19 ...

Added: June 26, 2020

Свойства дискурсивных формул на примере русских конструкций ты что и что ты

Bychkova P., Русский язык в научном освещении 2020 № 2 (40) С. 88–111

The paper discusses semantic description of the so-called discourse formulae, idiomatic expressions used as speaker's reactions in a dialogue. They are considered in the framework of construction grammar, as a peripheral class of constructions with its specific properties. A case study of two synonymous Russian discourse formulae TY ČTO and ČTO TY provides an account ...

Added: September 23, 2020

Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16)

Association for Computational Linguistics, 2017

The volume includes papers presented at the 16th International Workshop on Treebanks and Linguistic Theories (TLT), which brings together developers and users of linguistically annotated natural language corpora. As ‘treebanks’ we consider any pairing of natural language data (spoken or written) with annotations of linguistic structure at various levels of analysis, ranging from e.g. morpho-phonology ...

Added: December 11, 2018

Technological and Social Environments for Interactive Learning

Informing Science Press, 2013

Technology Enhanced Learning (TEL) is a very broad and increasingly mature research field. It encompasses a wide variety of research topics, ranging from the study of different pedagogical approaches and teaching/learning strategies and techniques, to the application of advanced technologies in educational settings such as the use of different kinds of mobile devices, sensors and ...

Added: February 20, 2013

PR в сфере культуры

Tulchinskii G. L., СПб.: Лань, 2011

В учебном пособии систематически изложены вопросы PR организации, учреждения, освещены цели, технологии этой деятельности, возможности анализа эффективности решения этих задач. В большей степени книга ориентирована на PR в деловой активности и особенно в социально-культурной некоммерческой сфере. В приложениях содержатся материалы и образцы документов, важные для практической организации PR. Книга может использоваться как для самостоятельного знакомства с ...

Added: October 5, 2012

Proceedings of the Forth International Conference on Cognitive Science

Tomsk: ., 2010

Added: November 18, 2013

Предсказания, большие данные и новые измерители: о возможности технологий компьютерной лингвистики в теоретических лингвистических исследованиях

Bonch-Osmolovskaya A. A., Вопросы языкознания 2016 № 2 С. 100–120

Статья посвящена обзору работ последних лет, в которых теоретическая исследовательская задача решается с помощью методов или инструментов, используемых в компьютерной лингвистике. В обзоре проводится подробный анализ того, как именно с помощью применения того или иного инструмента или метода можно получить новые знания о природе языка. В частности, выделяются два основных направления, развитие которых в рамках ...

Added: April 14, 2015

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Полисемия в списках самодийской базисной лексики и языковые контакты

Fedotova I., Урало-алтайские исследования 2020 № 2 (37) С. 77–113

This paper investigates cases of semantic shifts and proto-language polysemy in the Samoyed core lexicon. This research focuses on the shifts which have analogies in Turkic and Tungusic languages, identified with the help of semantic reconstruction. Special maps were created at LingvoDoc linguistic platform in order to demonstrate areas of similar polysemy and semantic shifts, ...

Added: October 19, 2020

23rd Conference of Open Innovations Association FRUCT, FRUCT 2018

IEEE Computer Society, 2018

23rd IEEE FRUCT Conference. ...

Added: November 1, 2020

Proceedings of the Eleventh International Conference on Computational Creativity

Coimbra: Association for Computational Creativity, 2020

Added: September 29, 2020

Exploring the Effectiveness of Methods for Persona Extraction

Konstantin Zaitsev, / Cornell University. Series Computer Science "arxiv.org". 2024.

The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian. To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models, resulting in improved data quality. A metric based on the F-score concept is presented to evaluate the effectiveness ...

Added: September 26, 2024