Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings

Karyaeva M.; P. Braslavski; Kiselev Y.

doi:10.1007/978-3-030-11027-7_8

Publications

?

Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings

P. 76–87.

Karyaeva M., Braslavski P., Kiselev Y.

The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym candidates for 22K word entries. Evaluation showed that the proposed methods applied to a large collection of dictionary data are a viable option for automatic extraction of hyponym/hypernym pairs. The obtained data is available for research purposes.

Language: English

DOI

Keywords: thesaurus word2vec semantic relations

In book

Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018

Springer, 2018.

WORD VECTOR MODELS AS AN OBJECT OF LINGUISTIC RESEARCH

Shavrina T., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)Вып. 18(25). [б.и.], 2019. P. 576–588.

This article launches a series of studies in which popular vector word2vec models are considered not as an element of the architecture of an NLP application, but as an independent object of linguistic research. The linguist's view on the surrogate of contexts on the corpus, as which vector models can be considered, makes it possible ...

Added: September 5, 2019

Detection of semantic changes in Russian nouns with distributional models and grammatical features

Ryzhova A., Ryzhova D., Sochenkov I., , in: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue” (2021)Issue 20: Основной том. -, 2021. P. 597–606.

The paper presents the models detecting the degree of semantic change in Russian nouns developed by the team aryzhova within the RuShiftEval competition of the Dialogue 2021 conference. We base our algorithms mostly on unsupervised distributional models and additionally test a model that uses vectors representing morphological preferences of the words in question. The best results are obtained ...

Added: October 30, 2021

Automated Word Sense Frequency Estimation for Russian Nouns

Lopukhina A., Лопухин К. А., Носырев Г. В., , in: Quantitative approaches to the Russian language. Abingdon: Routledge, 2018. P. 79–94.

According to G. K. Zipf’s observation, there is a strong correlation between word frequency and polysemy. Yet word sense frequency distribution is a neglected area in computational linguistics. Furthermore, the study of sense frequency has theoretical interest and practical applications for lexicography and word sense disambiguation. Although WordNet and SemCor contain some information about sense frequency ...

Added: October 11, 2016

Identifying emerging trends and hot topics through intelligent data mining: the case of clinical psychology and psychotherapy

Sokolova A., Lobanova P., Kuzminov I., Foresight 2024 Vol. 26 No. 1 P. 155–180

Purpose The purpose of the paper is to present an integrated methodology for identifying trends in a particular subject area based on a combination of advanced text mining and expert methods. The authors aim to test it in an area of clinical psychology and psychotherapy in 2010–2019. Design/methodology/approach The authors demonstrate the way of applying text-mining and the ...

Added: October 12, 2023

A cognitive model to enhance professional competence in computer science

Aleshinskaya E., Albatsha Ahmad, , in: Procedia Computer ScienceIssue 169: Postproceedings of the 10th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2019 (Tenth Annual Meeting of the BICA Society). Elsevier, 2020. P. 326–329.

The paper presents the results of the cognitive modeling of the COMPUTER SCIENCE terminological system in the form of a thesaurus. The thesaurus comprises over 3000 units, which are drawn from explanatory monolingual and bilingual dictionaries of computer science terms representing the basic phenomena and processes in the professional context. Methodologically, the analysis is based ...

Added: April 13, 2021

Learning Word Embeddings without Context Vectors

Zobnin A., Elistratova E., , in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)Issue W19-43. Association for Computational Linguistics, 2019. P. 244–249.

Most word embedding algorithms such as word2vec or fastText construct two sort of vectors: for words and for contexts. Naive use of vectors of only one sort leads to poor results. We suggest using indefinite inner product in skip-gram negative sampling algorithm. This allows us to use only one sort of vectors without loss of ...

Added: November 9, 2019

КОНСТРУИРОВАНИЕ ОБРАЗА ГОРОДА В ОФИЦИАЛЬНОЙ И ОБЫДЕННОЙ КОММУНИКАЦИИ: СРАВНИТЕЛЬНЫЙ АНАЛИЗ (НА МАТЕРИАЛЕ СОЦИАЛЬНЫХ МЕДИА)

Matkin N., Коммуникации. Медиа. Дизайн 2024

The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...

Added: November 15, 2023

Распределённые представления редких слов русского языка, учитывающие векторы однокоренных слов

Malafeev A., Мальтина Л. П., Научно-техническая информация. Серия 2: Информационные процессы и системы 2021 № 1

The paper proposes algorithms that perform automatic morphemic analysis of words and methods of distributed representations of words that indirectly use information about the morphemic composition through the averaging of vectors of same-root words. Morphemic analysis models for the Russian language are evaluated on samples of common and rare words. Several methods are proposed for obtaining ...

Added: November 9, 2020

К ВОПРОСУ О ФОРМИРОВАНИИ ТЕЗАУРУСНОЙ КОМПЕТЕНЦИИ УСТНОГО ПЕРЕВОДЧИКА

Moshchanskaya T., Мощанская Е. Ю., Современные проблемы науки и образования 2015 № 6

Abstract: The article deals with the development of interpreter’s thesaurus competence for situations of professional communication based on discourse-pragmatic and interdisciplinary approaches. The specificity of thesaurus competence formation for specialists and translators is described. The strategies of terms translation during consecutive interpretation of a training seminar in the field of emergency medicine are presented – the ...

Added: February 28, 2016

You shall know a piece by the company it keeps. Chess plays as a data for word2vec models

Orekhov B., / Series Computer Science "arxiv.org". 2024.

In this paper, I apply linguistic methods of analysis to non-linguistic data, chess plays, metaphorically equating one with the other and seeking analogies. Chess game notations are also a kind of text, and one can consider the records of moves or positions of pieces as words and statements in a certain language. In this article ...

Added: August 8, 2024

An Unsupervised Method for Weighting Finite-state Morphological Analyzers

Tyers F. M., Keleg A., Pirinen T., , in: Proceedings of The 12th Language Resources and Evaluation ConferenceVol. 12. European Language Resources Association (ELRA), 2020. P. 3842–3850.

Morphological analysis is one of the tasks that have been studied for years. Different techniques have been used to develop models for performing morphological analysis. Models based on finite state transducers have proved to be more suitable for languages with low available resources. In this paper, we have developed a method for weighting a morphological ...

Added: April 20, 2021

Юридический перевод в условиях унификации терминологии частного права в юридическом пан-европейском языке

Vlasenko S. V., Заславская В. А., Вестник Тверского государственного университета. Серия: Филология 2015 № 4 С. 222–234

The paper features a number of legal translation problems caused by the shaping of the Pan-European legal English and its rapid unification. In so doing, the paper aims at prioritizing those challenges associated with legal translation, which are seemingly linguistic by form, but affect the substantive comprehension of terminologies, legal norms and their definitions in ...

Added: November 23, 2015

Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian

Kutuzov A. B., Kuzmenko E., , in: Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer ScienceVol. 9041. Springer, 2015. P. 47–58.

In this paper we compare the Russian National Corpus to a larger Russian web corpus composed in 2014; the assumption behind our work is that the National corpus, being limited by the texts it contains and their proportions, presents lexical contexts (and thus meanings) which are different from those found ‘in the wild’ or in ...

Added: April 23, 2015

Texts in, meaning out: neural language models in semantic similarity task for Russian

Kutuzov A. B., Andreev I., , in: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015)Issue 14(21). M.: Russian State University for the Humanitie, 2015. P. 143–154.

Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from 2nd ...

Added: May 31, 2015

Automated Analysis of Discourse Coherence in Schizophrenia: Approximation of Manual Measures

Ryazanskaya G., Khudyakova M., , in: Proceedings of the LREC 2020 Workshop on: Resources and Processing of Linguistic, Para-linguistic and Extra-linguistic Data from People with Various Forms of Cognitive/Psychiatric/Developmental Impairments (RaPID-3). European Language Resources Association (ELRA), 2020. P. 98–107.

Disorganized, or incoherent, speech is one of the important criteria for diagnosing schizophrenia. However, there is still a lack of a rather quick objective method of measuring speech coherence. Automated discourse analysis is a possible solution to this problem. We analyzed discourse coherence in a set of spoken narratives by people with schizophrenia and neurotypical speakers ...

Added: February 2, 2021

Mise en abyme: способы презентации в нарративном тексте

Muravieva L., Narratorium: междисциплинарный журнал 2016 № 1 (9)

The article turns to the study of the semantic potential of the mise en abyme narrative figure. Traditionally, the mise en abyme is regarded to be a means of creating a metafictional effect while the semantic structure of the text containing mise en abyme is not considered. In this paper, this phenomenon is studied in ...

Added: October 3, 2016

YARN: Spinning-in-progress

Braslavski P., Ustalov D., Mukhin M. et al., , in: Proceedings of the 8th Global WordNet Conference, GWC 2016. Bucharest: [б.и.], 2016. P. 58–65.

YARN (Yet Another RussNet), a project started in 2013, aims at creating a large open WordNet-like thesaurus for Russian by means of crowdsourcing. The first stage of the project was to create noun synsets. Currently, the resource comprises 100K+ word entries and 46K+ synsets. More than 200 people have taken part in assembling synsets throughout ...

Added: November 9, 2018

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Malik M. S., Imran T., Mona Mamdouh J., PeerJ Computer Science 2023 Vol. 9 Article e1248

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such ...

Added: September 4, 2023

Automated defect identification for cell phones using language context, linguistic and smoke-word models

Muhammad Z. Y., Malik M. S., Ignatov D. I., Expert Systems with Applications 2023 Vol. 227 Article 120236

Product defects are a widespread concern for manufacturers when conducting quality and customer relationship management. Prior approaches addressed many electronic products however cell phones are still unexplored. Moreover, prior work mainly focused on the lexicon, probabilistic graphic, failure mode, and effect analysis models but the utilization of word embeddings and language models are not explored. State-of-the-art contextual word embeddings and language models generate automated features and ...

Added: June 13, 2023

Конгруэнтность юридических понятий и отраслевой узус как проблемы правовой лингвистики

Vlasenko S. V., Voronkov N., Вестник Тверского государственного университета. Серия: Филология 2015 № 2 С. 12–24

The article advocates a standpoint whereby semantic relations of proximity between termsof law are not established routinely and linguistic data are de-termined by the existingpattern and interaction of internal legal domains. The role of legal linguistics is emphasized; Russian–English correspondences for terminological attributes ‘fiscal’ and ‘financial’ are provided from the Russian and Anglo-American legal languages ...

Added: May 14, 2015

Effectiveness of ELMo embeddings, and semantic models in predicting review helpfulness

Malik M. S., Nawaz A., Jamjoom M. M. et al., Intelligent Data Analysis 2023 Vol. 28 No. 4 P. 1–21

Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the ...

Added: February 26, 2024

Cleaning Up After a Party: Post-processing Thesaurus Crowdsourced Data

Antropova O., Arslanova E., Shaposhnikov M. et al., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930. Switzerland: Springer, 2018. P. 133–138.

The study deals with post-processing of a noisy collection of synsets created using crowdsourcing. First, we cluster long synsets in three different ways. Second, we apply four cluster cleaning techniques based either on word popularity or word embeddings. Evaluation shows that the method based on word embeddings and existing dictionary definitions delivers best results. ...

Added: November 9, 2018

«Идеографический словарь диалектной языковой личности» как средство изучения картины мира

Zemicheva S., В кн.: Лексикография цифровой эпохи: сборник материалов Международного симпозиума (24–25 сентября 2021 г.). Издательство Томского государственного университета, 2021. С. 344–346.

Представлен опыт составления электронного идиолектного словаря идеографического типа, созданного на материале записей речи носителя сибирского говора. Кратко охарактеризованы особенности словаря. Описан потенциал его использования в русле когнитивной исследовательской парадигмы. ...

Added: November 1, 2022

Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing

Kutuzov A. B., Козлова О. С., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)Вып. 15. М.: Изд-во РГГУ, 2016. P. 288–300.

In natural language processing, distributional semantic models are known as an efficient data driven approach to word and text representation, which allows computing meaning directly from large text corpora into word embeddings in a vector space. This paper addresses the role of linguistic preprocessing in enhancing performance of distributional models, and particularly studies pronominal anaphora ...

Added: November 12, 2016