?
Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings
P. 76-87.
The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym candidates for 22K word entries. Evaluation showed that the proposed methods applied to a large collection of dictionary data are a viable option for automatic extraction of hyponym/hypernym pairs. The obtained data is available for research purposes.
In book
Springer, 2018
Zemicheva S., В кн. : Лексикография цифровой эпохи: сборник материалов Международного симпозиума (24–25 сентября 2021 г.). : Издательство Томского государственного университета, 2021. С. 344-346.
Представлен опыт составления электронного идиолектного словаря идеографического типа, созданного на материале записей речи носителя сибирского говора. Кратко охарактеризованы особенности словаря. Описан потенциал его использования в русле когнитивной исследовательской парадигмы. ...
Added: November 1, 2022
Kutuzov A. B., Козлова О. С., , in : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.). Вып. 15.: М. : Изд-во РГГУ, 2016. P. 288-300.
In natural language processing, distributional semantic models are known as an efficient data driven approach to word and text representation, which allows computing meaning directly from large text corpora into word embeddings in a vector space. This paper addresses the role of linguistic preprocessing in enhancing performance of distributional models, and particularly studies pronominal anaphora ...
Added: November 12, 2016
Lopukhina A., Лопухин К. А., Носырев Г. В., , in : Quantitative approaches to the Russian language. : Abingdon : Routledge, 2018. P. 79-94.
According to G. K. Zipf’s observation, there is a strong correlation between word frequency and polysemy. Yet word sense frequency distribution is a neglected area in computational linguistics. Furthermore, the study of sense frequency has theoretical interest and practical applications for lexicography and word sense disambiguation. Although WordNet and SemCor contain some information about sense frequency ...
Added: October 11, 2016
Shavrina T., , in : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.). Вып. 18(25).: [б.и.], 2019. P. 576-588.
This article launches a series of studies in which popular vector word2vec models are considered not as an element of the architecture of an NLP application, but as an independent object of linguistic research. The linguist's view on the surrogate of contexts on the corpus, as which vector models can be considered, makes it possible ...
Added: September 5, 2019
Kutuzov A. B., Kuzmenko E., , in : Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science. Vol. 9041.: Springer, 2015. P. 47-58.
In this paper we compare the Russian National Corpus to a larger Russian web corpus composed in 2014; the assumption behind our work is that the National corpus, being limited by the texts it contains and their proportions, presents lexical contexts (and thus meanings) which are different from those found ‘in the wild’ or in ...
Added: April 23, 2015
Ryzhova A., Ryzhova D., Sochenkov I., , in : Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue” (2021). Issue 20: Основной том.: -, 2021. P. 597-606.
The paper presents the models detecting the degree of semantic change in Russian nouns developed by the team
aryzhova within the RuShiftEval competition of the Dialogue 2021 conference. We base our algorithms mostly
on unsupervised distributional models and additionally test a model that uses vectors representing morphological
preferences of the words in question. The best results are obtained ...
Added: October 30, 2021
Matkin N. A., Коммуникации. Медиа. Дизайн 2024
The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...
Added: November 15, 2023
Malafeev A., Мальтина Л. П., Научно-техническая информация. Серия 2: Информационные процессы и системы 2021 № 1
The paper proposes algorithms that perform automatic morphemic analysis of words and methods of distributed representations of words that indirectly use information about the morphemic composition through the averaging of vectors of same-root words. Morphemic analysis models for the Russian language are evaluated on samples of common and rare words. Several methods are proposed for obtaining ...
Added: November 9, 2020
Moshchanskaya T., Мощанская Е. Ю., Современные проблемы науки и образования 2015 № 6
Abstract:
The article deals with the development of interpreter’s thesaurus competence for situations of professional communication based on discourse-pragmatic and interdisciplinary approaches. The specificity of thesaurus competence formation for specialists and translators is described. The strategies of terms translation during consecutive interpretation of a training seminar in the field of emergency medicine are presented – the ...
Added: February 28, 2016
Sokolova A., Lobanova P., Kuzminov I., Foresight 2024 Vol. 26 No. 1 P. 155-180
Purpose
The purpose of the paper is to present an integrated methodology for identifying trends in a particular subject area based on a combination of advanced text mining and expert methods. The authors aim to test it in an area of clinical psychology and psychotherapy in 2010–2019.
Design/methodology/approach
The authors demonstrate the way of applying text-mining and the ...
Added: October 12, 2023
Vlasenko S. V., Заславская В. А., Вестник Тверского государственного университета. Серия: Филология 2015 № 4 С. 222-234
The paper features a number of legal translation problems caused by the shaping of the Pan-European legal English and its rapid unification. In so doing, the paper aims at prioritizing those challenges associated with legal translation, which are seemingly linguistic by form, but affect the substantive comprehension of terminologies, legal norms and their definitions in ...
Added: November 23, 2015
Aleshinskaya E., Albatsha Ahmad, , in : Procedia Computer Science. Issue 169: Postproceedings of the 10th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2019 (Tenth Annual Meeting of the BICA Society).: Elsevier, 2020. P. 326-329.
The paper presents the results of the cognitive modeling of the COMPUTER SCIENCE terminological system in the form of a thesaurus. The thesaurus comprises over 3000 units, which are drawn from explanatory monolingual and bilingual dictionaries of computer science terms representing the basic phenomena and processes in the professional context. Methodologically, the analysis is based ...
Added: April 13, 2021
Kutuzov A. B., Andreev I., , in : Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015). Issue 14(21).: M. : Russian State University for the Humanitie, 2015. P. 143-154.
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from 2nd ...
Added: May 31, 2015
Zobnin A., Elistratova E., , in : Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). Issue W19-43.: Association for Computational Linguistics, 2019. P. 244-249.
Most word embedding algorithms such as word2vec or fastText construct two sort of vectors: for words and for contexts. Naive use of vectors of only one sort leads to poor results. We suggest using indefinite inner product in skip-gram negative sampling algorithm. This allows us to use only one sort of vectors without loss of ...
Added: November 9, 2019
Ryazanskaya G., Khudyakova M., , in : Proceedings of the LREC 2020 Workshop on: Resources and Processing of Linguistic, Para-linguistic and Extra-linguistic Data from People with Various Forms of Cognitive/Psychiatric/Developmental Impairments (RaPID-3). : European Language Resources Association (ELRA), 2020. P. 98-107.
Disorganized, or incoherent, speech is one of the important criteria for diagnosing schizophrenia. However, there is still a lack of a rather quick objective method of measuring speech coherence. Automated discourse analysis is a possible solution to this problem. We analyzed discourse coherence in a set of spoken narratives by people with schizophrenia and neurotypical speakers ...
Added: February 2, 2021
Muravieva L., Narratorium: междисциплинарный журнал 2016 № 1 (9)
The article turns to the study of the semantic potential of the mise en abyme narrative figure. Traditionally, the mise en abyme is regarded to be a means of creating a metafictional effect while the semantic structure of the text containing mise en abyme is not considered. In this paper, this phenomenon is studied in ...
Added: October 3, 2016
Braslavski P., Ustalov D., Mukhin M. et al., , in : Proceedings of the 8th Global WordNet Conference, GWC 2016. : Bucharest : [б.и.], 2016. P. 58-65.
YARN (Yet Another RussNet), a project started in 2013, aims at creating a large open WordNet-like thesaurus for Russian by means of crowdsourcing. The first stage of the project was to create noun synsets. Currently, the resource comprises 100K+ word entries and 46K+ synsets. More than 200 people have taken part in assembling synsets throughout ...
Added: November 9, 2018
Malik M. S., Imran T., Mona Mamdouh J., PeerJ Computer Science 2023 Vol. 9 Article e1248
Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such ...
Added: September 4, 2023
Muhammad Z. Y., Malik M. S., Ignatov D. I., Expert Systems with Applications 2023 Vol. 227 Article 120236
Product defects are a widespread concern for manufacturers when conducting quality and customer relationship management. Prior approaches addressed many electronic products however cell phones are still unexplored. Moreover, prior work mainly focused on the lexicon, probabilistic graphic, failure mode, and effect analysis models but the utilization of word embeddings and language models are not explored. State-of-the-art contextual word embeddings and language models generate automated features and ...
Added: June 13, 2023
Vlasenko S. V., Voronkov N., Вестник Тверского государственного университета. Серия: Филология 2015 № 2 С. 12-24
The article advocates a standpoint whereby semantic relations of proximity between termsof law are not established routinely and linguistic data are de-termined by the existingpattern and interaction of internal legal domains. The role of legal linguistics is emphasized; Russian–English correspondences for terminological attributes ‘fiscal’ and ‘financial’ are provided from the Russian and Anglo-American legal languages ...
Added: May 14, 2015
Malik M. S., Nawaz A., Jamjoom M. M. et al., Intelligent Data Analysis 2023 P. 1-21
Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the ...
Added: February 26, 2024
Antropova O., Arslanova E., Shaposhnikov M. et al., , in : Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, Proceedings. Issue 930.: Switzerland : Springer, 2018. P. 133-138.
The study deals with post-processing of a noisy collection of synsets created using crowdsourcing. First, we cluster long synsets in three different ways. Second, we apply four cluster cleaning techniques based either on word popularity or word embeddings. Evaluation shows that the method based on word embeddings and existing dictionary definitions delivers best results. ...
Added: November 9, 2018
Tyers F. M., Keleg A., Pirinen T., , in : Proceedings of The 12th Language Resources and Evaluation Conference. Vol. 12.: European Language Resources Association (ELRA), 2020. P. 3842-3850.
Morphological analysis is one of the tasks that have been studied for years. Different techniques have been used to develop models for performing morphological analysis. Models based on finite state transducers have proved to be more suitable for languages with low available resources. In this paper, we have developed a method for weighting a morphological ...
Added: April 20, 2021
Zemicheva S., В кн. : Современное развитие славянской лексикологии и лексикографии. Международная коллективная монография. : М. : Институт русского языка им. В.В. Виноградова РАН, 2022. С. 109-119.
Анализируется опыт создания идиолектного словаря идеографического типа, его
своеобразие на фоне существующих лексикографических продуктов. Сравниваются
такие параметры, как принципы отбора материала, объём словника, репрезентация
различных типов отношений слов. Словарь даёт возможность впервые целостно и наглядно представить картину мира диалектоносителя как иерархическую понятийную
структуру, что определяет его новизну и источниковедческий потенциал. ...
Added: November 28, 2022