• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • How much does a word weight? Weighting word embeddings for word sense induction
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 3, 2026
Pocket Money, Personal Interest, and Family Practices: What Shapes Students Economic Literacy?
University students' economic literacy depends not only on their field of study but also on their interest in economics, the learning environment, and family financial practices. For example, students who received pocket money irregularly tend to perform better on economic literacy tests than their peers who received financial support on a regular basis. These findings come from a study conducted by HSE University involving more than 1,100 students from five Russian universities. The findings have been published in Cakrawala Pendidikan.
June 3, 2026
Creative Work as a Remedy for Burnout
The creative, supportive atmosphere and innovative methods at the Centre for Sociocultural Research make it appealing to early-career scholars. Over years of working at HSE University, they grow into researchers and lecturers recognised both in Russia and abroad. Chief Research Fellow Zarina Lepshokova and Leading Research Fellow Ekaterina Bushina spoke about their journey at the centre and at HSE, their research, and the role of mentors in their academic success.
June 2, 2026
HSE Study Reveals Imbalance in the Generative AI Market
Researchers at HSE University analysed how effectively the global generative artificial intelligence market converts investment into real revenue, concluding that AI is currently developing faster than it is paying off. The results have been published in the journal Foresight and STI Governance.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

How much does a word weight? Weighting word embeddings for word sense induction

P. 68–84.
Arefyev, N., Ermolaev P., Panchenko A.

The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 [Panchenko et al., 2018]. For each of several dozens of ambiguous words, the participants were asked to group text fragments containing it according to the senses of this word, which were not provided beforehand, therefore the „induction“part of the task. For instance, a word “bank” and a set of text fragments (also known as “contexts”) in which this word occurs, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water” were given. A participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. The organizers proposed three evaluation datasets of varying complexity and text genres based respectively on texts of Wikipedia, Web pages, and a dictionary of the Russian language. We present two experiments: a positive and a negative one, based respectively on clustering of contexts represented as a weighted average of word embeddings and on machine translation using two state-of-the-art production neural machine translation systems. Our team showed the second best result on two datasets and the third best result on the remaining one dataset among 18 participating teams. We managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings.

Language: English
Text on another site
Keywords: machine translationword embeddings word sense induction

In book

Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings
M.: Conference Proceedings Editorial board, 2018.
Similar publications
Language barriers in metaverses: the power of neural networks in translation
Osipov D., Евразийский филологический вестник 2023 No. 2 P. 21–39
The metaverse is a shared, virtual space, accessible to users worldwide, offering a platform for global interaction. The physical barriers of geographical location and time are non-existent, allowing for seamless connectivity and interaction. Language barriers within and between metaverses present substantial impediments to fluid interaction and collaboration. Failure to address this linguistic divergence can stifle ...
Added: March 19, 2024
Взаимосвязь экспертных категорий и автоматических метрик, используемых для оценки качества перевода
Sosnin A., Balakina Y. V., Кащихин А. Н., Вестник Санкт-Петербургского университета. Язык и литература 2022 Т. 19 № 1 С. 125–148
The article evaluates the quality of translation; we consider the applied and pragmatic aspects of such evaluation in the conditions of the current rapid increase in the number of texts to be translated. The article summarizes a plethora of assessment principles, each having its merits and drawbacks, and examines the correlation between the categories of adequacy and equivalence as ...
Added: May 31, 2022
An Interpretable Approach to Lexical Semantic Change Detection with Lexical Substitution
Arefyev N.V., Bykov D. A., , in: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue” (2021)Issue 20: Основной том.: -, 2021. P. 31–46.
Added: September 23, 2021
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution
Nikolay Arefyev, Sheludko B., Podolskiy A. et al., , in: Proceedings of the 28th International Conference on Computational Linguistics.: International Committee on Computational Linguistics, 2020. P. 1242–1255.
Lexical substitution, i.e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study ...
Added: December 7, 2020
A resource-light method for cross-lingual semantic textual similarity
Glavas G., Franco-Salvador M., Ponzetto S. et al., Knowledge-Based Systems 2018 Vol. 143 P. 1–9
Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many ...
Added: October 29, 2020
Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase
Gharavi E., Veisi H., Россо П., Neural Computing and Applications 2020 Vol. 32 No. 14 P. 10593–10607
The efficiency and scalability of plagiarism detection systems have become a major challenge due to the vast amount of available textual data in several languages over the Internet. Plagiarism occurs in different levels of obfuscation, ranging from the exact copy of original materials to text summarization. Consequently, designed algorithms to detect plagiarism should be robust ...
Added: October 29, 2020
Evaluation of Vector Transformations for Russian Word2Vec and FastText Embeddings
Korogodina O., Karpik O., Klyshinsky E., , in: GraphiCon 2020 - Proceedings of the 30th International Conference on Computer Graphics and Machine Vision.: St. Petersburg: CEUR-WS, 2020.
Authors of Word2Vec claimed that their technology could solve the word analogy problem using the vector transformation in the introduced vector space. However, the practice demonstrates that it is not always true. In this paper, we investigate several Word2Vec and FastText model trained for the Russian language and find out reasons of such inconsistency. We ...
Added: October 21, 2020
Neural GRANNy at SemEval-2019 Task 2: A combined approach for better modeling of semantic relationships in semantic frame induction
Arefyev Nikolay, Sheludko B., Adis D. et al., , in: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019).: Minneapolis: Association for Computational Linguistics, 2019. P. 31–38.
We describe our solutions for semantic frame and role induction subtasks of SemEval 2019 Task 2. Our approaches got the highest scores, and the solution for the frame induction problem officially took the first place. The main contributions of this paper are related to the semantic frame induction problem. We propose a combined approach that ...
Added: October 10, 2020
Hm2 at semeval 2019 task2: Unsupervised frame induction using contextualized and uncontextualized word embeddings
Anwar S., Ustalov D., Arefyev N. et al., , in: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019).: Minneapolis: Association for Computational Linguistics, 2019. P. 125–129.
We present our system for semantic frame induction that showed the best performance in Subtask B.1 and finished as the runner-up in Subtask A of the SemEval 2019 Task 2 on unsupervised semantic frame induction (Qasem-iZadeh et al., 2019). Our approach separates this task into two independent steps: verb clustering using word and their context ...
Added: October 10, 2020
Word2vec not dead: predicting hypernyms of co-hyponyms is better than reading definitions
Arefyev N V., Fedoseev M., Kabanov A. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 13–32.
Expert-built lexical resources are known to provide information of good quality for the cost of low coverage. This property limits their applicability in modern NLP applications. Building descriptions of lexical-semantic relations manually in sufficient volume requires a huge amount of qualified human labour. However, given some initial version of a taxonomy is already built, automatic ...
Added: October 9, 2020
Russe’2018: A shared task on word sense induction for the Russian language
Panchenko A., Lopukhina A., Ustalov D. et al., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings.: M.: Conference Proceedings Editorial board, 2018. P. 547–564.
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...
Added: October 9, 2020
Neural networks with attention for word sense induction
Struyanskiy O., Arefyev, N., , in: Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018.: Aachen: CEUR Workshop Proceedings, 2018. P. 208–213.
Attentional neural networks have achieved remarkable results for a number of tasks in the past few years. The fascinating success of neural networks with attention mechanism in natural language processing, especially in machine translation, suggests that these models can capture the meaning of ambiguous words considering their context. In this paper we introduce a new ...
Added: October 9, 2020
Combining neural language models for word sense induction
Arefyev, N, Boris S., Aleksashina T., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected PapersVol. 11832.: Cham: Springer, 2019. P. 105–121.
Word sense induction (WSI) is the problem of grouping occurrences of an ambiguous word according to the expressed sense of this word. Recently a new approach to this task was proposed, which generates possible substitutes for the ambiguous word in a particular context using neural language models, and then clusters sparse bag-of-words vectors built from ...
Added: October 9, 2020
Combining Lexical Substitutes in Neural Word Sense Induction
Nikolay Arefyev, Boris S., Panchenko A., , in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019.: INCOMA Ltd, 2019. P. 62–70.
Word Sense Induction (WSI) is the task of grouping of occurrences of an ambiguous word according to their meaning. In this work, we improve the approach to WSI proposed by Amrami and Goldberg (2018) based on clustering of lexical substitutes for an ambiguous word in a particular context obtained from neural language models. Namely, we ...
Added: October 9, 2020
Word Embedding for Semantically Related Words: An Experimental Study
Karyaeva M., Braslavski P., Sokolov V., Automatic Control and Computer Sciences 2019 Vol. 53 P. 638–643
The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted ...
Added: April 10, 2020
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Ekaterina Artemova, Bakarov A., Artemov A. et al., Journal of Cognitive Science 2020 Vol. 1 No. 21 P. 15–52
In this paper, our focus is the connection and influence of language technologies on the research in neurolinguistics. We present a review of ​brain imaging-based neurolinguistics studies with a focus on the natural language representations, such as word embeddings and pre-trained language model. Mutual enrichment of neurolinguistics and language technologies leads to development of brain-aware natural ...
Added: January 17, 2020
Learning Word Embeddings without Context Vectors
Zobnin A., Elistratova E., , in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)Issue W19-43.: Association for Computational Linguistics, 2019. P. 244–249.
Most word embedding algorithms such as word2vec or fastText construct two sort of vectors: for words and for contexts. Naive use of vectors of only one sort leads to poor results. We suggest using indefinite inner product in skip-gram negative sampling algorithm. This allows us to use only one sort of vectors without loss of ...
Added: November 9, 2019
A Dataset for Noun Compositionality Detection for a Slavic Language
Puzyrev D., Shelmanov A., Panchenko A. et al., , in: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, Florence, Italy, Association for Computational Linguistics.: Association for Computational Linguistics, 2019. P. 56–62.
aper presents the first gold-standard resource for Russian annotated with compositionality information of noun compounds. The compound phrases are collected from the Universal Dependency treebanks according to part of speech patterns, such as ADJ+NOUN or NOUN+NOUN, using the gold-standard annotations. Each compound phrase is annotated by two experts and a moderator according to the following ...
Added: October 30, 2019
Noun Compositionality Detection using Distributional Semantics for the Russian Language
Puzyrev D. A., Shelmanov A., Panchenko A. et al., , in: Analysis of Images, Social Networks and Texts. 8th International Conference AIST 2019.: Springer, 2019. P. 218–229.
In this paper, we present the first gold-standard corpus of Russian noun compounds annotated with compositionality information. We used Universal Dependency treebanks to collect noun compounds according to part of speech patterns, such as ADJ-NOUN or NOUN-NOUN and annotated them according to the following schema: a phrase can be either compositional, non-compositional, or ambiguous (i.e., ...
Added: October 30, 2019
Automatic Mining of Discourse Connectives for Russian
Toldova S., Pisarevskaya D., Kobozeva M., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930.: Switzerland: Springer, 2018. P. 79–87.
The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between ...
Added: October 26, 2018
Text classification with deep learning neural networks
Voronkov Ilia, Amajd M., Kaimuldenov Z., , in: Actual Problems of System and Software Engineering 2017. Proceedings of the 5th International Conference on Actual Problems of System and Software Engineering Supported by Russian Foundation for Basic Research. Project #17-07-20565 Moscow, Russia, November 14-16, 2017, 408 P.Vol. 1989.: Aachen: CEUR Workshop Proceedings, 2017. P. 362–370.
In this paper, we analyze the use of different neural networks for the text classification task. The accuracy of the studied text classifiers can be changed by a small number of previously classified texts. This is important due to the fact that in many applications of text classification a large number of unlabeled texts are easily accessible, while ...
Added: August 16, 2018
RUSSE2018: a Shared Task on Word Sense Induction for the Russian Language
Panchenko A., Lopukhina A., Ustalov D. et al., Компьютерная лингвистика и интеллектуальные технологии 2018 No. 17 P. 547–564
The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...
Added: June 7, 2018
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit