Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

P. Panicheva; Protopopova E.; Bukia G.; Mitrofanova O.

doi:10.1007/978-3-319-52920-2_22

Publications

?

Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

P. 236–247.

Panicheva P., Protopopova E., Bukia G., Mitrofanova O.

In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength between Russian nouns and adjectives. A dataset of nouns and associated adjectives is used as the test set for pseudodisambiguation task. Models are trained with corpora of Russian fiction. A measure of lexical association anomaly is applied evaluating similarity between the initial noun and the resulting attributive phrase. Results of association strength are reported for models characterized by different parameter values; the best parameter value combinations are proposed. The test exemplars producing the error rate are manually annotated, and the model errors are categorized in terms of their linguistic nature and compositionality features.

Language: English

DOI

Keywords: distributional semantics word association measures vector-space representation evaluation

In book

Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information Science

Vol. 661. , Switzerland: Springer, 2017.

Fear and Loathing in Russian Literature: A Case of Emotion Annotation of Short Stories of the 20th Century

Anna Moskvina, Margarita Kirina, , in: 27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534Vol. 2534.: Springer, 2025. P. 113–129.

The paper presents an investigation of the emotional aspect of the Russian short story of the 20th century. Our study is two-fold: firstly, we delve into emotional representation at the lexical level, building upon previous work on utilizing vector models to quantify emotional content. In this study, we introduce an annotated corpus where words are ...

Added: November 29, 2024

Automatic detection of grammatical aspect of Russian verbs based on their morphological properties

Petrunina U., Filip H., , in: Proceedings of the Fourth International Workshop on Resources and Tools for Derivational Morphology.: Dubrovnik: Croatian Language Technology Society, 2023.

Added: October 2, 2024

Where Is Happily Ever After? A Study of Emotions and Locations in Russian Short Stories of 1900–1930

Moskvina A., Kirina M., , in: Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023).: Springer, 2023. P. 123–135.

The paper tackles the problem of the automatic detection of emotions in literary texts using distributional semantics techniques. The experiment was carried out on the material of Russian short stories from the 1900-1930s. We investigated the emotional lexis distribution across different locations in narratives. At first, we calculated the semantic association score between each word ...

Added: December 9, 2023

Несчастливы по-своему: как измерить тональность литературного текста?

Sherstinova T., Moskvina A., Kirina M. et al., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 232–240.

In the experimental study, the results of three different approaches to the evaluation of the tonality of literary texts are compared: dictionary-based, machine learning, and distributional semantics. The material for analysis was a selection of 210 stories by Russian writers from the first three decades of the 20th century. The research showed that the correlation ...

Added: December 9, 2023

От любви до ненависти: распределение эмоциональной лексики в русском рассказе начала XX века

Moskvina A., Kirina M., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 156–166.

The paper presents the results of experiments investigating the distribution of emotional vocabulary in Russian short stories of the beginning of the 20th century. The emotionality of words and texts is determined automatically using the methods of distributive semantics, which does not require the use of dictionaries or preliminary data annotation. The results include data ...

Added: December 9, 2023

Automated Detection of Non-Relevant Posts on the Russian Imageboard "2ch": Importance of the Choice of Word Representations

Bakarov A., Gureenkova O., Lecture Notes in Computer Science 2018 P. 16–21

This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the ...

Added: December 12, 2020

The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics

Bakarov A., PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '18) 2018 P. 153–161

Swivel (Submatrix-WIse Vector Embedding Learner) is a distributional semantic model based on counting point-wise mutual information values, capable of capturing word-context co-occurrences in the PMI matrix that were not noted in the training corpus. This model outperforms mainstream word embedding training algorithms such as Continuous Bag-of-Words, GloVe and Skip-Gram in word similarity and word analogy ...

Added: December 12, 2020

A corpus study of semantic coherence in schizophrenia in Russian written texts

Panicheva P., Litvinova T., , in: The Fifth Saint Petersburg Winter Workshop on Experimental Studies of Speech and Language (Night Whites 2019).: St. Petersburg: Центр научно-информационных технологий "Астерион", 2019. P. 81–81.

Added: October 29, 2020

Semantic Coherence in Schizophrenia in Russian Written Texts

Panicheva P., Litvinova T., , in: Proceedings of the 25th Conference of Open Innovations Association FRUCT, University of Helsinki, Helsinki, Finland.: Helsinki: IEEE, 2019. P. 241–249.

Schizophrenia is widely known to manifest in language disturbance. Namely, speech incoherence, tangentiality, derailment are indicative of thought disorder characteristic of schizophrenia. Recent advances in distributional semantics have made it possible to measure coherence in text in a unified and objective manner. It has been shown that semantic coherence measures based on distributional semantic models ...

Added: October 29, 2020

Exploring Semantic Concreteness and Abstractness for Metaphor Identification and Beyond

Badryzlova Y., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)Вып. 19(26).: М.: Изд-во РГГУ, 2020. P. 33–47.

The paper presents a method for computing indexes of semantic concreteness and abstractness in two languages (Russian and English). These indexes are used in metaphor identification experiments in both languages; the results are either comparable to or surpass pervious work and the baselines. We analyze the obtained indexes of concreteness and abstractness to see how ...

Added: August 24, 2020

Типология лексики. Компьютерные методы и инструменты

Ryzhova D., СПб.: Алетейя, 2020.

Лексическая типология – область лингвистики, которая занимается сопоставительным анализом значений слов в разных языках, – на сегодняшний день добилась больших успехов: разработаны методики сбора и анализа материала, описан целый ряд семантических полей. Однако некоторые методологические ограничения по-прежнему не преодолены: процесс сбора данных очень трудоемок, что сказывается либо на объемах и представительности языковых выборок, либо на ...

Added: June 2, 2020

Authorship Attribution in Russian in Real-World Forensics Scenario

Panicheva P., Litvinova T., , in: Statistical Language and Speech Processing. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 11816 LNAIVol. 11816: Statistical Language and Speech Processing 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings.: Springer Publishing Company, 2019. P. 299–310.

Recent demands in authorship attribution, specifically, cross-topic authorship attribution with small numbers of training samples and very short texts, impose new challenges on corpora design, feature and algorithm development. In the current work we address these challenges by performing authorship attribution on a specifically designed dataset in Russian. We present a dataset of short written ...

Added: October 28, 2019

Computer and metaphor: when lexicon, morphology, punctuation, and other beasts fail to predict sentence metaphoricity

Badryzlova Y., Lyashevskaya O., Panicheva P., , in: Когнитивные исследования языка. Вып. XXXVII: Интегративные процессы в когнитивной лингвистике: материалы международного конгресса по когнитивной лингвистикеТ. XXXVII: Интегративные процессы в когнитивной лингвистике: материалы международного конгресса по когнитивной лингвистике.: Деком, 2019. Ch. IV P. 609–615.

The paper provides linguistic explanations to the results of the supervised machine learning experiments for identification of verbal metaphor in Russian texts. We look at the classification accuracy of models based on different features (distributional semantics and lexical and morphosyntactic co-occurrence, etc.) and explore the behavior of verb constructions and wider context in order to investigate the reasons behind the ...

Added: October 23, 2019

Automatic construction of lexical typological questionnaires

Paperno D., Ryzhova D., , in: Methodological Tools for Linguistic Description and TypologyIssue 16.: University of Hawaii Press, 2019. Ch. 5 P. 45–61.

Questionnaires constitute a crucial tool in linguistic typology and language description. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in its turn based on the ...

Added: August 30, 2019

Semantic Feature Aggregation for Gender Identification in Russian Facebook

Panicheva P., Mirzagitova A., Ledovaya Y., , in: Artificial Intelligence and Natural Language, 6th Conference, AINL 2017, St. Petersburg, Russia, September 20–23, 2017, Revised Selected PapersIssue 789.: Switzerland: Springer, 2018. Ch. 1 P. 3–15.

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling ...

Added: February 18, 2019

Dark personalities on Facebook: Harmful online behaviors and language

Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...

Added: February 18, 2019

Constructing a typological questionnaire with distributional semantic models

Ryzhova D., Paperno D., , in: The Typology of Physical Qualities.: Amsterdam: John Benjamins Publishing Company, 2022. Ch. 9 P. 309–328.

Added: October 21, 2018

Distributional semantic features in Russian verbal metaphor identification.

Panicheva P., Badryzlova Y., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 31 мая — 3 июня 2017 г.). Вып. 16 (23): В 2 т.Т. 1.: М.: Изд-во РГГУ, 2017. P. 179–190.

Our experiment is aimed at evaluating the performance of distributional semantic features in metaphor identification in Russian raw text. We apply two types of distributional features representing similarity between the metaphoric/ literal verb and its syntactic or linear context. Our approach is evaluated on a dataset of nine Russian verb context, which is made available to the community. The results ...

Added: August 30, 2018

Webvectors: A toolkit for building web interfaces for vector semantic models

Kutuzov A., Kuzmenko E., , in: Supplementary Proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2016), Yekaterinburg, Russia, April 7-9, 2016.Vol. 1710.: Aachen: CEUR Workshop Proceedings, 2016. P. 155–161.

The paper presents a free and open source toolkit which aim is to quickly deploy web services handling distributed vector models of semantics. It fills in the gap between training such models (many tools are already available for this) and dissemination of the results to general public. Our toolkit, WebVectors, provides all the necessary routines for ...

Added: April 20, 2017