Word Sense Induction for Russian: Deep Study and Comparison with Dictionaries

Лопухин К. А.; B. Iomdin; A. Lopukhina

?

Word Sense Induction for Russian: Deep Study and Comparison with Dictionaries

Компьютерная лингвистика и интеллектуальные технологии. 2017. Vol. 1. No. 16. P. 121–134.

Лопухин К. А., Iomdin B., Lopukhina A.

The assumption that senses are mutually disjoint and have clear boundaries has been drawn into doubt by several linguists and psychologists. The problem of word sense granularity is widely discussed both in lexicographic and in NLP studies. We aim to study word senses in the wild—in raw corpora— by performing word sense induction (WSI). WSI is the task of automatically inducing the different senses of a given word in the form of an unsupervised learning task with senses represented as clusters of token instances. In this paper, we compared four WSI techniques: Adaptive Skip-gram (AdaGram), Latent Dirichlet Allocation (LDA), clustering of contexts and clustering of synonyms. We quantitatively and qualitatively evaluated them and performed a deep study of the AdaGram method comparing AdaGram clusters for 126 words (nouns, adjectives, and verbs) and their senses in published dictionaries. We found out that AdaGram is quite good at distinguishing homonyms and metaphoric meanings. It ignores disappearing and obsolete senses, but induces new and domain-specific senses which are sometimes absent in dictionaries. However it works better for nouns than for verbs, ignoring the structural differences (e.g. causative meanings or different government patterns). The Adagram database is available online: http://adagram.ll-cl.org/.

Priority areas: humanitarian IT and mathematics

Language: English

Text on another site

Keywords: многозначность полисемия lexical semantics polysemy semantic vectors word sense induction

Using predefined vector systems to speed up neural network multimillion class classification

Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.

Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...

Added: April 2, 2026

Semantic integrity of a word structure and semantic primitives

Trofimova N., Pesina S. A., Vinogradova S. A. et al., Res Militaris 2022 Vol. 12 No. 2 P. 2111–2119

The article attempts to describe the way of storing and functioning of meanings of polysemous words in the linguistic lexicon. To achieve this goal we turned to research of semantic primitives discovered in the course of lexical analysis. Within the framework of the interdisciplinary approach to the problems of words meanings ambiguity, the article justifies ...

Added: February 23, 2026

A naive picture of the world and a biosemantic approach to describing the lexical structure of a word

Trofimova N., Pesina S., Vinogradova S. et al., Revista EntreLinguas 2023 Vol. 9 No. 00

The problems of studying the lexical structure of a word have a way out into various areas of cognitive science, including biosemiotics. In the article, the biosemiotic approach is reframed into a biosemantic approach based on decoding specific lexical structures. The lexical invariants of polysemous words are shown to be meaningful cores of their figurative ...

Added: February 23, 2026

Лексема "православный" как элемент оппозиции «свое – чужое» в дискурсе IT

Комышкова А. Д., В кн.: Теоретическая семантика и идеографическая лексикография: Словарь. Дискурс. Корпус: тезисы докладов Всероссийской науч. конф. с международным участием. 17-18 октября 2024, Екатеринбург.: Екатеринбург: Кабинетный ученый, 2024.

The article presents an analysis of the semantics of the lexeme православный (orthodox) based on non-standardized written speech on the Internet (using the subcorpus of social networks in the National Corpus of Russian Language). The vast majority of cases where православный is used in a derogatory or ironic sense are related to IT discourse. The meaning of ...

Added: February 19, 2026

Полисемия агентивных суффиксов в славянских языках: когнитивно-семантический анализ

Андреева А. А., Jezikoslovni Zapiski 2025 Т. 31 № 1 С. 133–163

В статье анализируется полисемия суффиксов существительных, обозначающих деятеля, в шести славянских языках (русском, украинском, польском, чешском, сербском и словенском) с использованием «метонимического» подхода к словообразованию, разработанного Л. Яндой (2011). Рассматриваются семантические особенности глаголов, к которым могут присоединяться суффиксы, обозначающие деятеля, и описываются семантические типы, представленные производными существительными. В работе показано, что суффиксы имени деятеля служат ...

Added: February 16, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Сложное слово и словосочетание: корпусный подход (случай «bad blood»)

Филатов А. С., Когнитивные исследования языка 2025 Т. 1-2 № 25 С. 302–305

The article demonstrates the productivity of corpus-based linguistic analysis regarding the problem of distinguishing phrases from compounds. The object of the research is “bad blood” in the American English language, the morphological status of which is approached in close connection with its real-life usage and the polysemies of its constituents. ...

Added: November 24, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025

Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.

Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.

The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...

Added: November 9, 2025

Diffusion on language model embeddings for protein sequence generation

Meshchaninov V., Strashnov, P., Shevtsov A. et al., / Cornell University. Серия CoRR, arXiv:2403.03726 "Computing Research Repository,". 2025.

Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived ...

Added: October 5, 2025

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Shabalin A., Meshchaninov V., Vetrov D., / Series cs.CL, arXiv:2505.18853 "Computation and Language". 2025.

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic ...

Added: October 5, 2025

Влияние сопровождающей жестикуляции на интерпретацию многозначных предложений с отрицанием и квантором

Добрынина А. И., RHEMA. РЕМА 2024 № 4 С. 9–41

В русском языке кванторы могут сопровождаться различными жестами, при этом семантика жеста, согласно предыдущим исследованиям, может коррелировать с семантикой квантора [Гришина 2015]. В данной работе мы предполагаем, что жесты с семантикой всеобщности, произведенные одновременно с многозначным предложением с квантором, будут способствовать интерпретации предложения как универсального утверждения. Для проверки этой гипотезы был проведен эксперимент: записаны видео с ...

Added: October 2, 2025

Политическая аккомодация культурных различий в индустриально развитых обществах (Political Accommodation of Cultural Differences in Industrialized Societies)

Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.

The notion of “political accommodation” applied to the theory and practice of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...

Added: September 26, 2025

A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis

Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.

Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...

Added: September 23, 2025

Национальная мощь современных государств: сравнительный анализ. Аналитический доклад

Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.

Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...

Added: September 19, 2025

On the construction of frieze patterns from partitions of convex polygons by nonintersecting diagonals

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 07600.

We demonstrate in an elementary way how to construct a frieze pattern of width m-3 from a partition of a convex m-gon by not intersecting diagonals. ...

Added: September 17, 2025

On one property of Catalan numbers

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 20584.

We give a new proof of the following statement: the Catalan number C_n is divisible by n+2, if n is odd and n<> 3k+1. ...

Added: September 9, 2025

Трансформация семантики лексических единиц в медийном дискурсе

Romanova T. V., В кн.: Медиалингвистика. Вып.12. Язык в координатах массмедиа: Материалы IX международной конференции (СПбГУ, 25-28 июля 2025). Санкт-Петербург: Медиапапир, 2025, 890 с.Вып. 12.: СПб.: Медиапапир, 2025. С. 130–137.

В статье представлено описание трансформации семантики когнитивных терминов в медиаисточниках. ...

Added: September 4, 2025

Low Sets and Closure Properties of Counting Function Classes

Ivanashev Y., / Series Computer Science "arxiv.org". 2025.

Added: July 29, 2025

Семантические переходы в языках мира

МАКС Пресс, 2024.

Сборник по материалам ежегодного лингвистического Форума Института языкознания Российской академии наук. Форум 2024 г. посвящен разнообразию семантических переходов в языках мира, а также призван выявить в них системные типологические универсалии. Семантический переход рассматривается в оптике различных подходов: лексической семантики, полисемии, этимологии и языковой реконструкции, лексической типологии, грамматикализации, когнитивной лингвистики, культурной антропологии. Также рассматриваются семантические кальки, мотивационные модели и стратегии номинации в ...

Added: November 16, 2024

Механизмы обработки разных типов метафоры у многозначных слов в русском языке

Koncha K., Orlov A., Lopukhina A. et al., Российский журнал когнитивной науки 2022 Т. 9 № 3-4 С. 41–61

Theories of storing the multiple senses of polysemous words in the mental lexicon suggest three key approaches. Some theories assume that only literal senses are stored, while non-literal senses are derived from them via rules; other theories suggest that all senses are stored separately. There is also a hybrid approach which assumes that some non-literal ...

Added: November 15, 2023

Семантическое поле поиска в амгуэмском диалекте чукотского языка: от морфосинтаксиса к семантике и обратно

Starchenko A., Вопросы языкознания 2024 № 3 С. 99–127

The study describes the semantic field of search in Amguema Chukchi within the frame-based approach to lexical typology. I discuss lexical items of the macroframe of seeking an object (‘search for something’): the verb qərirək and the lexical affixes -rerək and -ɣiɬik; verbs of the macroframe of searching space (‘search some place’): ojpətkok и rəritɬʔewək; ...

Added: October 24, 2023