• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Combining neural language models for word sense induction
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Combining neural language models for word sense induction

P. 105–121.
Arefyev, N, Boris S., Aleksashina T.

Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from these substitutes. In this work, we apply this approach to the Russian language and improve it in two ways. First, we propose methods of combining left and right contexts, resulting in better substitutes generated. Second, instead of fixed number of clusters for all ambiguous words we propose a technique for selecting individual number of clusters for each word. Our approach established new state-of-the-art level, improving current best results of WSI for the Russian language on two RUSSE 2018 datasets by a large margin.

Language: English
DOI
Text on another site
Keywords: word sense inductionLanguage modelscontextual substitutes
Publication based on the results of:
Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected Papers
Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected Papers
Vol. 11832. , Cham: Springer, 2019.
Similar publications
The voice of Twitter: observable subjective well-being inferred from tweets in Russian
Smetanin S., Mikhail Komarov, PeerJ Computer Science 2022 Vol. 8 Article e1181
As one of the major platforms of communication, social networks have become a valuable source of opinions and emotions. Considering that sharing of emotions offline and online is quite similar, historical posts from social networks seem to be a valuable source of data for measuring observable subjective well-being (OSWB). In this study, we calculated OSWB ...
Added: December 29, 2022
An Interpretable Approach to Lexical Semantic Change Detection with Lexical Substitution
Arefyev N.V., Bykov D. A., , in: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue” (2021)Issue 20: Основной том.: -, 2021. P. 31–46.
Added: September 23, 2021
Language Models for Cloze Task Answer Generation in Russian
Lopukhina Anastasia, Pletenev S., Nikiforova A. et al., , in: Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources.: Marseille: European Language Resources Association (ELRA), 2020. P. 28–37.
Linguistics predictability is the degree of confidence in which language unit (word, part of speech, etc.) will be the next in the sequence. Experiments have shown that the correct prediction simplifies the perception of a language unit and its integration into the context. As a result of an incorrect prediction, language processing slows down. Currently, ...
Added: April 20, 2021
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution
Nikolay Arefyev, Sheludko B., Podolskiy A. et al., , in: Proceedings of the 28th International Conference on Computational Linguistics.: International Committee on Computational Linguistics, 2020. P. 1242–1255.
Lexical substitution, i.e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study ...
Added: December 7, 2020
SumTitles: a Summarization Dataset with Low Extractiveness
Malykh V., Chernis K., Artemova E. et al., , in: Proceedings of the 28th International Conference on Computational Linguistics.: International Committee on Computational Linguistics, 2020. Ch. 503 P. 5718–5730.
The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization along with baseline evaluation. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) ...
Added: December 6, 2020
Neural GRANNy at SemEval-2019 Task 2: A combined approach for better modeling of semantic relationships in semantic frame induction
Arefyev Nikolay, Sheludko B., Adis D. et al., , in: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019).: Minneapolis: Association for Computational Linguistics, 2019. P. 31–38.
We describe our solutions for semantic frame and role induction subtasks of SemEval 2019 Task 2. Our approaches got the highest scores, and the solution for the frame induction problem officially took the first place. The main contributions of this paper are related to the semantic frame induction problem. We propose a combined approach that ...
Added: October 10, 2020
Hm2 at semeval 2019 task2: Unsupervised frame induction using contextualized and uncontextualized word embeddings
Anwar S., Ustalov D., Arefyev N. et al., , in: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019).: Minneapolis: Association for Computational Linguistics, 2019. P. 125–129.
We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context ...
Added: October 10, 2020
Word2vec not dead: predicting hypernyms of co-hyponyms is better than reading definitions
Arefyev N V., Fedoseev M., Kabanov A. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 13–32.
Expert-built lexical resources are known to provide information of good quality for the cost of low coverage. This property limits their applicability in modern NLP applications. Building descriptions of lexical-semantic relations manually in sufficient volume requires a huge amount of qualified human labour. However, given some initial version of a taxonomy is already built, automatic ...
Added: October 9, 2020
Russe’2018: A shared task on word sense induction for the Russian language
Panchenko A., Lopukhina A., Ustalov D. et al., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings.: M.: Conference Proceedings Editorial board, 2018. P. 547–564.
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...
Added: October 9, 2020
How much does a word weight? Weighting word embeddings for word sense induction
Arefyev, N., Ermolaev P., Panchenko A., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings.: M.: Conference Proceedings Editorial board, 2018. P. 68–84.
The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 [Panchenko et al., 2018]. For each of several dozens of ambiguous words, the participants were asked to group text fragments containing it according to the senses of this word, which were not provided beforehand, ...
Added: October 9, 2020
Neural networks with attention for word sense induction
Struyanskiy O., Arefyev, N., , in: Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018.: Aachen: CEUR Workshop Proceedings, 2018. P. 208–213.
Attentional neural networks have achieved remarkable results for a number of tasks in the past few years. The fascinating success of neural networks with attention mechanism in natural language processing, especially in machine translation, suggests that these models can capture the meaning of ambiguous words considering their context. In this paper we introduce a new ...
Added: October 9, 2020
Combining Lexical Substitutes in Neural Word Sense Induction
Nikolay Arefyev, Boris S., Panchenko A., , in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019.: INCOMA Ltd, 2019. P. 62–70.
Word Sense Induction (WSI) is the task of grouping of occurrences of an ambiguous word according to their meaning. In this work, we improve the approach to WSI proposed by Amrami and Goldberg (2018) based on clustering of lexical substitutes for an ambiguous word in a particular context obtained from neural language models. Namely, we ...
Added: October 9, 2020
L-models and R-models for Lambek calculus enriched with additives and the multiplicative unit
Kanovich M., Kuznetsov S., Scedrov A., , in: Logic, Language, Information, and Computation: 26th International Workshop, WoLLIC 2019, Utrecht, The Netherlands, July 2-5, 2019, ProceedingsVol. 11541: Lecture Notes in Computer Science.: Berlin, Heidelberg: Springer, 2019. P. 373–391.
Language and relational models, or L-models and R-models, are two natural classes of models for the Lambek calculus. Completeness w.r.t. L-models was proved by Pentus and completeness w.r.t. R-models by Andréka and Mikulás. It is well known that adding both additive conjunction and disjunction together yields incompleteness, because of the distributive law. The product-free Lambek ...
Added: September 4, 2019
RUSSE2018: a Shared Task on Word Sense Induction for the Russian Language
Panchenko A., Lopukhina A., Ustalov D. et al., Компьютерная лингвистика и интеллектуальные технологии 2018 No. 17 P. 547–564
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...
Added: June 7, 2018
Word Sense Induction for Russian: Deep Study and Comparison with Dictionaries
Лопухин К. А., Iomdin B., Lopukhina A., Компьютерная лингвистика и интеллектуальные технологии 2017 Vol. 1 No. 16 P. 121–134
The assumption that senses are mutually disjoint and have clear boundaries has been drawn into doubt by several linguists and psychologists. The problem of word sense granularity is widely discussed both in lexicographic and in NLP studies. We aim to study word senses in the wild—in raw corpora— by performing word sense induction (WSI). WSI ...
Added: September 27, 2017
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit