Word Sense Frequency of Similar Polysemous Words in Different Languages

B. Iomdin; A. Lopukhina; Лопухин К. А.; Носырев Г. В.

?

Word Sense Frequency of Similar Polysemous Words in Different Languages

Компьютерная лингвистика и интеллектуальные технологии. 2016. No. 15. P. 214–225.

Iomdin B., Lopukhina A., Лопухин К. А., Носырев Г. В.

When words have several senses, it is important to describe them properly in dictionary (a lexicographic task) and to be able to distinguish them in a given context (a computational linguistics task, WSD). Different senses normally have different frequencies in corpora. We introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency is not only useful for explanatory lexicography and WSD, but it also may enrich language learning resources. Learners of a foreign language who encounter a word similar to one of their native language are often tempted to assume that the foreign word and its equivalent have the same meaning structure. Sometimes, however, this is not the case, and the most frequent sense of a word in one language may be much less frequent for its cognate. We proposed a method for detecting such cases. Having selected a set of Russian words included into the Active Dictionary of Russian which have more than two dictionary senses and have cognates in English, we estimated the frequencies for English and Russian senses using SemCor and Russian National Corpus respectively, matched the senses in each pair of words and compared their frequencies. Thus we revealed cases in which the most frequent senses and whole meaning structures are, cross-linguistically, substantially different and studied them in more detail. This technique can be applied not only to cognates, but also to pairs of words which are usually offered by the dictionaries as the translation equivalents of each other.

Research target: Computer Science Philology and Linguistics

Priority areas: humanitarian

Language: English

Full text

Text on another site

Keywords: semantics frequency lexicography polysemy experiments text corpora meaning frequency

RUSSE2018: a Shared Task on Word Sense Induction for the Russian Language

Panchenko A., Lopukhina A., Ustalov D. et al., Компьютерная лингвистика и интеллектуальные технологии 2018 No. 17 P. 547–564

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...

Added: June 7, 2018

Активный словарь русского языка

Апресян Ю. Д., Apresyan V., Бабаева Е. Э. et al., М.: Языки славянской культуры, 2014.

The present Active Dictionary of the Russian Language is an innovative product, the first dictionary of this type in Russian lexicography. It is created on the basis of the latest theoretical achievements in the following areas: a) theoretical linguistics (the principle of lexicon as a system, the principle of integrated linguistic descriptions); b) semantics (fundamental ...

Added: April 7, 2015

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.

М.: РГГУ, 2013.

Сборник включает 84 доклада международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2013», представляющих широкий спектр теоретических и прикладных исследований в области описания естественного языка, моделирования языковых процессов, создания практически применимых компьютерных лингвистических технологий. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: May 13, 2013

Ученые записки СПб ИВЭСЭП

СПб.: ИВЭСЭП, Знание, 2015.

The collected papers contain articles by famous and young scientists on actual problems of philology (cognitive linguistics, lexical semantics, semiotics, pragmatics, text linguistics, stylistics; poetics, literary criticism; translation, intercultural communication). The issue also presents research on foreign language teaching methods. The edition is addressed to linguists, translators, teachers, postgraduates, students and a wide readership. ...

Added: March 30, 2016

Кросс-культурные и когнитивные аспекты интерпретации семиотической информации

Gorchakov Y. V., Taratuhina Y., Вестник РГГУ. Серия "Информатика. Информационная безопасность. Математика" (Российская Федерация) 2020 № 2 С. 8–26

In this paper the authors consider the cross-cultural and cognitive aspects of the semiotic information transmission, namely, how information is exchanged and interpreted by representatives of different cultural groups in the context of business processes. The article deals with the issues of subjective perception of information, the visual effectiveness of business process models and their ...

Added: October 16, 2020

Word Sense Disambiguation for Russian Verbs Using Semantic Vectors and Dictionary Entries

Lopukhina A., Лопухин К. А., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 393–405

Word sense disambiguation (WSD) methods are useful for many NLP tasks that require semantic interpretation of input. Furthermore, such methods can help estimate word sense frequencies in different corpora, which is important for lexicographic studies and language learning resources. Although previous research on Russian polysemous verbs disambiguation established some important and interesting results, it was mostly ...

Added: October 11, 2016

Proceedings of the Fourth International Conference on Meaning–Text Theory

Observatoire de linguistique Sens-Texte, 2009.

These proceedings include papers on subjects from a wide number of areas including theoretical linguistics, translation, computational linguistics, natural language processing, and applied linguistics, focusing on a variety of languages, ranging from familiar Indo-European languages to Mandarin Chinese, Wolof, and Dene Sųɬiné. In order to make the papers available to the wider research community, these ...

Added: August 20, 2014

Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts

Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69–89

The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...

Added: June 24, 2021

Proceedings of the Eleventh International Conference on Computational Creativity

Coimbra: Association for Computational Creativity, 2020.

Added: September 29, 2020

Применение методов корпусной лингвистики для определения контекстно-специфических слов и коллокаций

Gorina O. G., Вестник Ленинградского государственного университета имени А.С. Пушкина. Серия: Экономика 2011 Т. 7 № 3 С. 27–36

The article elaborates on composing and designing own corpora that would represent certain types of discourse, it also reviews implementation of available corpus software to identify a text or a genre specific key words, looks at corpus tools to identify and measure collocation strength using large national corpora. ...

Added: February 14, 2017

Innovative Use of NLP for Building Educational Applications

Stroudsburg, PA: Association for Computational Linguistics, 2019.

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications ...

Added: October 5, 2020

Digital Russia: The Language, Culture and Politics of New Media Communication

L.: Routledge, 2014.

This book provides a comprehensive analysis of the ways in which new media technologies have shaped language and communication in contemporary Russia. It traces the development of the Russian-language internet (Runet) from late-Soviet cybernetics to the advent of Twitter and explores the evolution of web-based communication practices, showing how they have both shaped and been ...

Added: December 11, 2013

Тринадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2012 (16-20 октября 2012 г., г. Белгород, Россия). Том 2

Белгород: Белгородский государственный технологический университет им. В.Г. Шухова, 2012.

Важность проведения очередной тринадцатой национальной конференции по искусственному интеллекту (КИИ-2012) обусловлена необходимостью обмена научной информацией и последними достижениями в данной области. В обсуждении фундаментальных теоретических и прикладных проблем, возникающих при создании интеллектуальных систем, принимают участие ведущие ученые и специалисты из академических институтов, научных и промышленных организаций, а также вузов России, стран ближнего и дальнего зарубежья. ...

Added: November 13, 2012

Comparing two “thermometers”: Impact factors of 20 leading economic journals according to Journal Citation Reports and Scopus

Pislyakov V., Scientometrics 2009 Vol. 79 No. 3 P. 541–550

Impact factors for 20 journals ranked first by Journal Citation Reports (JCR) were compared with the same indicator calculated on the basis of citation data obtained from Scopus database. A significant discrepancy was observed as Scopus, though results differed from title to title, found in general more citations than listed in JCR. This also affected ...

Added: January 25, 2013

Материалы 21-й Международной конференции по компьютерной лингвистике "Диалог"

М.: Изд-во РГГУ, 2015.

Сборник содержит труды 21-й Международной конференции по компьютерной лингвистике. ...

Added: May 20, 2015

Предсказания, большие данные и новые измерители: о возможности технологий компьютерной лингвистики в теоретических лингвистических исследованиях

Bonch-Osmolovskaya A. A., Вопросы языкознания 2016 № 2 С. 100–120

Статья посвящена обзору работ последних лет, в которых теоретическая исследовательская задача решается с помощью методов или инструментов, используемых в компьютерной лингвистике. В обзоре проводится подробный анализ того, как именно с помощью применения того или иного инструмента или метода можно получить новые знания о природе языка. В частности, выделяются два основных направления, развитие которых в рамках ...

Added: April 14, 2015

Полисемия в списках самодийской базисной лексики и языковые контакты

Fedotova I., Урало-алтайские исследования 2020 № 2 (37) С. 77–113

This paper investigates cases of semantic shifts and proto-language polysemy in the Samoyed core lexicon. This research focuses on the shifts which have analogies in Turkic and Tungusic languages, identified with the help of semantic reconstruction. Special maps were created at LingvoDoc linguistic platform in order to demonstrate areas of similar polysemy and semantic shifts, ...

Added: October 19, 2020

PR в сфере культуры

Tulchinskii G. L., СПб.: Лань, 2011.

В учебном пособии систематически изложены вопросы PR организации, учреждения, освещены цели, технологии этой деятельности, возможности анализа эффективности решения этих задач. В большей степени книга ориентирована на PR в деловой активности и особенно в социально-культурной некоммерческой сфере. В приложениях содержатся материалы и образцы документов, важные для практической организации PR. Книга может использоваться как для самостоятельного знакомства с ...

Added: October 5, 2012

Exploring the Effectiveness of Methods for Persona Extraction

Konstantin Zaitsev, / Series Computer Science "arxiv.org". 2024.

The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian. To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models, resulting in improved data quality. A metric based on the F-score concept is presented to evaluate the effectiveness ...

Added: September 26, 2024

О специфике словарей современного немецкого молодежного языка

Rossikhina M. Y., Вопросы лексикографии 2014 № 2(6) С. 5–16

A lot of dictionaries of youth jargon (traditionally called youth slang) were published in Germany over the period from 2000 to 2013. They fall into three categories. The first group are annual editions of multilingual dictionaries by PONS and Langenscheidt publishers which give words and collocations used by schoolchildren from Germany, Austria and Switzerland their ...

Added: January 11, 2015

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019.

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Специфические слова и выражения русских классиков XIX века: опыт контрастивного корпусного исследования

Orekhov B., Ученые записки Петрозаводского государственного университета. Серия: Общественные и гуманитарные науки 2019 № 5 С. 70–75

The paper presents the results of a quantitative study that identifies characteristic and specific low-frequency words for the prose of Russian classic writers of the XIX century. TF-IDF measure and a large collection of the XIX century texts by Turgenev, Goncharov, Leskov and Dostoevsky were used to identify words and phrases that are rarely found ...

Added: September 18, 2019

The 26th International Conference on Computational Linguistics (COLING 2016)

[б.и.], 2016.

Added: December 1, 2016

Technological and Social Environments for Interactive Learning

Informing Science Press, 2013.

Technology Enhanced Learning (TEL) is a very broad and increasingly mature research field. It encompasses a wide variety of research topics, ranging from the study of different pedagogical approaches and teaching/learning strategies and techniques, to the application of advanced technologies in educational settings such as the use of different kinds of mobile devices, sensors and ...

Added: February 20, 2013