The Taming of the Polysemy: Automated Word Sense Frequency Estimation for Lexicographic Purposes

A. Lopukhina; Лопухин К. А.; B. Iomdin; Носырев Г. В.

?

The Taming of the Polysemy: Automated Word Sense Frequency Estimation for Lexicographic Purposes

Lopukhina A., Лопухин К. А., Iomdin B., Носырев Г. В.

Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense frequency is not easy to estimate. In this paper we deal with the problem of automated word sense frequency estimation for Russian nouns. We developed and tested an automated system based on semantic context vectors, supplied with contexts and collocations from the Active Dictionary of Russian — a full-fledged production dictionary that reflects contemporary Russian. The study was performed on RuTenTen11 web-corpus. This allows us to reach a frequency estimation error of 11% without any additional labeled data. We compared sense frequencies obtained automatically with sense ordering in different dictionaries for several words. The method presented in this paper can be applied to any language with a sufficiently large corpus and a good dictionary that provides examples for each sense. The results may enrich language learning resources and help lexicographers order senses within a word according to frequency if needed.

Language: English

Full text

Text on another site

Keywords: semantics lexicography word sense disambiguation WSD word sense frequency

In book

Proceedings of the XVII EURALEX International Congress. Lexicography and Linguistic Diversity (6 – 10 September, 2016)

Tbilisi: Ivane Javakhishvili Tbilisi State University, 2016.

Automated Word Sense Frequency Estimation for Russian Nouns

Lopukhina A., Лопухин К. А., Носырев Г. В., , in: Quantitative approaches to the Russian language. Abingdon: Routledge, 2018. P. 79–94.

According to G. K. Zipf’s observation, there is a strong correlation between word frequency and polysemy. Yet word sense frequency distribution is a neglected area in computational linguistics. Furthermore, the study of sense frequency has theoretical interest and practical applications for lexicography and word sense disambiguation. Although WordNet and SemCor contain some information about sense frequency ...

Added: October 11, 2016

Lexical Variation: Word Knowledge and Polysemy in Russian Everyday Life Lexicon

Levin I., Andriyanets V., Iomdin B. et al., , in: Computational Linguistics and Intellectual Technologies. Vol. 17. 2018.Т. 17. Вып. 24. [б.и.], 2018. P. 414–423.

Many words that according to the dictionaries have just one meaning are in fact understood in different ways by different speakers. In this article we deal with Russian nouns denoting everyday life objects which are subject to much variation by age, gender, and region and are poorly described by the existing dictionaries. We report the ...

Added: September 20, 2018

Активный словарь русского языка

Апресян Ю. Д., Apresyan V., Бабаева Е. Э. et al., М.: Языки славянской культуры, 2014.

The present Active Dictionary of the Russian Language is an innovative product, the first dictionary of this type in Russian lexicography. It is created on the basis of the latest theoretical achievements in the following areas: a) theoretical linguistics (the principle of lexicon as a system, the principle of integrated linguistic descriptions); b) semantics (fundamental ...

Added: April 7, 2015

Negation and Valencies of Russian Predicates

Iomdin B., Iomdin L., , in: Meaning Text Theory: Current DevelopmentsVol. . Issue 85. Muenchen: Wiener Slawistischer Almanach, 2013.

The paper discusses the semantic interaction of the negation with certain types of verbal predicates in Russian, which involves, depending on the predicate type and its main valency structure, the emergence of new semantic valencies: the valency of the missing distance, the valency of the missing time span, and the valency of the missing quantity. ...

Added: August 20, 2014

Constructicography: Constructicon development across languages

Philadelphia, Amsterdam: John Benjamins Publishing Company, 2018.

In constructionist theory, a constructicon is an inventory of constructions making up the full set of linguistic units in a language. In applied practice, it is a set of construction descriptions – a “dictionary of constructions”. The development of constructicons in the latter sense typically means combining principles of both construction grammar and lexicography, and ...

Added: October 11, 2018

Word Sense Frequency of Similar Polysemous Words in Different Languages

Iomdin B., Lopukhina A., Лопухин К. А. et al., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 214–225

When words have several senses, it is important to describe them properly in dictionary (a lexicographic task) and to be able to distinguish them in a given context (a computational linguistics task, WSD). Different senses normally have different frequencies in corpora. We introduced several techniques for determining sense frequency based on dictionary entries matched with ...

Added: October 11, 2016

Что такое орехи?

Iomdin B., В кн.: Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015). М.: Изд-во РГГУ, 2015. С. 210–224.

When describing words which denote real life objects, dictionaries tend to use scientific terms and classifications, even when dealing with natural language. This approach may lead to misunderstanding, especially in cases when scientific classification (e.g. in biology) differs from what is found in natural language data. One of such cases is discussed here, namely the ...

Added: June 6, 2015

Толковый словарь русской разговорной речи. Вып. 3: П–Р

Гловинская М. Я., Голанова Е. И., Ермакова О. П. et al., М.: Издательский дом ЯСК, 2019.

В словаре содержится описание лексики современной русской разговорной речи. Словарь имеет экспериментальный характер и, в отличие от большинства академических толковых словарей, не является нормативным. Задачей составителей было с возможной полнотой отразить в словарной форме семантические, грамматические, сочетаемостные, стилистические свойства лексико-фразеологических средств, используемых в повседневной речи современного городского жителя, а также особенности их употребления в разных ...

Added: November 14, 2019

Интерпретация имен собственных в косвенных контекстах: именование de re и фикции

Mikirtumov I., Слово.ру: балтийский акцент 2022 Т. 13 № 2 С. 75–98

The author explores the meaning of proper names and other types of singular terms in the context of propositional attitudes, combining the problems of empty names, rigid designators and non-specific reading. An object in the attitudes can be given to the agent as such (de re), in the description (de dicto), as well as in ...

Added: August 16, 2023

Когнитивные исследования языка. Вып. XXIII : Лингвистические технологии в гуманитарных исследованиях : сборник научных трудов

М., Тамбов: Институт языкознания РАН; Издательский дом ТГУ им. Г.Р. Державина, 2015.

В сборнике представлены материалы, отражающие современные направления изучения лингвистических технологий в гуманитарных исследованиях. Обсуждаются с когнитивной точки зрения вопросы трансфера знаний, использование когнитивных технологий в современных гуманитарных науках, исследование языкового сознания с помощью применения данных технологий. Особое внимание уделено технологиям когнитивных исследований в лингвистике, в частности, технологиям моделирования и интерпретации языка и речи на грамматическом ...

Added: November 26, 2015

Exploring Semantic Concreteness and Abstractness for Metaphor Identification and Beyond

Badryzlova Y., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)Вып. 19(26). М.: Изд-во РГГУ, 2020. P. 33–47.

The paper presents a method for computing indexes of semantic concreteness and abstractness in two languages (Russian and English). These indexes are used in metaphor identification experiments in both languages; the results are either comparable to or surpass pervious work and the baselines. We analyze the obtained indexes of concreteness and abstractness to see how ...

Added: August 24, 2020

Комментируя Пиндара (Pyth. 5.10–11): что делает Кастор — «заливает очаг» или «воспламеняет очаг»?

Akhunova O., Шаги/Steps 2024 Т. 10 № 2 С. 89–100

The task of this short paper is to show, using the example of v. 10–11 of Pindar’s Fifth Pythian Ode, that the semantics of the verb καταιθύσσω is misunderstood, and that this erroneous understanding has been recorded in almost all dictionaries, starting with the dictionary LSJ. As a result, the meaning of many poetic contexts, ...

Added: July 5, 2024

Вторая международная конференция по семантике и прагматике «HSE Semantics & Pragmatics Workshop» : Москва, 4–5 сентября 2018

Smirnov M., Философия. Журнал Высшей школы экономики 2018 Т. 2 № 4 С. 253–264

Academic report: Second International Conference ‘HSE Semantics & Pragmatics Workshop’ : Moscow, September 4–5, 2018 ...

Added: January 15, 2019

2-в-1: две посессивные конструкции в одной парадигме севернохантыйских суффиксов

Mikhailov S., В кн.: Малые языки в большой лингвистикеВып. 3. М.: Буки Веди, 2021. С. 106–117.

This paper argues that there are at least two adnominal possessive contructions in the Kazym dialect of Northern Khanty, both of which are exponed by the same set of suffixes. The arguments for this view include the (un)availability of explicit possessor mention, the presence of a uniqueness requirement, and the availability of context-dependent associative possessive ...

Added: December 29, 2021

On Minimal and Maximal Suffixes of a Substring

Maxim Babenko, Ignat Kolesnichenko, Starikovskaya T., Lecture Notes in Computer Science 2013 Vol. 7922 P. 28–37

Lexicographically minimal and lexicographically maximal suffixes of a string are fundamental notions of stringology. It is well known that the lexicographically minimal and maximal suffixes of a given string S can be computed in linear time and space by constructing a suffix tree or a suffix array of S. Here we consider the case when ...

Added: November 13, 2013

Специфические слова и выражения русских классиков XIX века: опыт контрастивного корпусного исследования

Orekhov B., Ученые записки Петрозаводского государственного университета. Серия: Общественные и гуманитарные науки 2019 № 5 С. 70–75

The paper presents the results of a quantitative study that identifies characteristic and specific low-frequency words for the prose of Russian classic writers of the XIX century. TF-IDF measure and a large collection of the XIX century texts by Turgenev, Goncharov, Leskov and Dostoevsky were used to identify words and phrases that are rarely found ...

Added: September 18, 2019

Semantic Artificial Intelligence

Kharlamov A. A., , in: Lecture Notes in Networks and Systems (LNNS, volume 231) Proceedings of 5th Computational Methods in Systems and Software 2021 (CoMeSySo 2021)Vol. Vol 2: Data Science and Intelligent Systems. Issue 231. Springer, 2021.

Added: January 5, 2022

Лексикология английского языка

Киселева С. В., Кононова И. В., Trofimova N., St. Petersburg: ., 2022.

This textbook is intended for students studying in the bachelor's degree program "Linguistics" and preparing for the exam in the discipline "Fundamentals of the theory of the first foreign language". The manual aims to give students an idea of the specifics of the vocabulary of the modern English language, the origin of words, the problems of the meaning of ...

Added: April 9, 2023

ВОСПРИЯТИЕ, ЗНАНИЕ И ЕСТЕСТВЕННЫЙ ЯЗЫК

Куслий П. С., Mikirtumov I., Эпистемология и философия науки 2022 Т. 59 № 2 С. 6–22

In this paper, we would like to argue in support of the productiveness of epistemological investigations at the interface of the semantics and pragmatics of natural language and the analysis of perception. We begin with a short overview the history of convergence of these two areas of research. Leibniz is the center of this historical discussion. We ...

Added: August 16, 2023

О специфике словарей современного немецкого молодежного языка

Rossikhina M. Y., Вопросы лексикографии 2014 № 2(6) С. 5–16

A lot of dictionaries of youth jargon (traditionally called youth slang) were published in Germany over the period from 2000 to 2013. They fall into three categories. The first group are annual editions of multilingual dictionaries by PONS and Langenscheidt publishers which give words and collocations used by schoolchildren from Germany, Austria and Switzerland their ...

Added: January 11, 2015

После, через, спустя во временны́х контекстах: из наблюдений над текстами казахско-русских билингвов

Rakhilina E. V., Казкенова А. К., Akhapkina Y., Вестник Томского государственного университета. Филология 2021 Т. 73 С. 93–113

Рассматриваются случаи нестандартного употребления казахско-русскими билингвами предлогов после, через и спустя во временны́х контекстах. Доказывается, что отклонения обусловлены грамматическими различиями между родным и русским языками. Анализ отклонений выявил специфические черты предлогов: способность указывать на завершение событий и отрезков времени, как единичных, так и повторяющихся, а также неоднозначность через в составе сочетаний с названиями разных временны́х интервалов. ...

Added: December 1, 2021

Современное развитие славянской лексикологии и лексикографии. Международная коллективная монография

М.: Институт русского языка им. В.В. Виноградова РАН, 2022.

Монография отражает тенденции современной лексикографии, представлен опыт создания лингвистических ресурсов различных типов, в том числе уникальных и редких лексикографических проектов. ...

Added: November 28, 2022

15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings

Springer, 2018.

The sixteen-volume set comprising the LNCS volumes 11205-11220 constitutes the refereed proceedings of the 15th European Conference on Computer Vision, ECCV 2018, held in Munich, Germany, in September 2018. The 776 revised papers presented were carefully reviewed and selected from 2439 submissions. The papers are organized in topical sections on learning for vision; computational photography; human analysis; ...

Added: October 30, 2018

Путеводитель по дискурсивным словам русского языка

Baranov A., Plungian V., Rakhilina E. V., М.: Помовский и партнеры, 1993.

Added: November 12, 2023