Automatic detection of grammatical aspect of Russian verbs based on their morphological properties

?

Automatic detection of grammatical aspect of Russian verbs based on their morphological properties

In book

Proceedings of the Fourth International Workshop on Resources and Tools for Derivational Morphology

Dubrovnik: Croatian Language Technology Society, 2023.

Fear and Loathing in Russian Literature: A Case of Emotion Annotation of Short Stories of the 20th Century

Anna Moskvina, Margarita Kirina, , in: 27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534Vol. 2534.: Springer, 2025. P. 113–129.

The paper presents an investigation of the emotional aspect of the Russian short story of the 20th century. Our study is two-fold: firstly, we delve into emotional representation at the lexical level, building upon previous work on utilizing vector models to quantify emotional content. In this study, we introduce an annotated corpus where words are ...

Added: November 29, 2024

Where Is Happily Ever After? A Study of Emotions and Locations in Russian Short Stories of 1900–1930

Moskvina A., Kirina M., , in: Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023).: Springer, 2023. P. 123–135.

The paper tackles the problem of the automatic detection of emotions in literary texts using distributional semantics techniques. The experiment was carried out on the material of Russian short stories from the 1900-1930s. We investigated the emotional lexis distribution across different locations in narratives. At first, we calculated the semantic association score between each word ...

Added: December 9, 2023

Несчастливы по-своему: как измерить тональность литературного текста?

Sherstinova T., Moskvina A., Kirina M. et al., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 232–240.

In the experimental study, the results of three different approaches to the evaluation of the tonality of literary texts are compared: dictionary-based, machine learning, and distributional semantics. The material for analysis was a selection of 210 stories by Russian writers from the first three decades of the 20th century. The research showed that the correlation ...

Added: December 9, 2023

От любви до ненависти: распределение эмоциональной лексики в русском рассказе начала XX века

Moskvina A., Kirina M., В кн.: Труды международной конференции «Корпусная лингвистика — 2023».: СПб.: Издательство Санкт-Петербургского государственного университета, 2024. С. 156–166.

The paper presents the results of experiments investigating the distribution of emotional vocabulary in Russian short stories of the beginning of the 20th century. The emotionality of words and texts is determined automatically using the methods of distributive semantics, which does not require the use of dictionaries or preliminary data annotation. The results include data ...

Added: December 9, 2023

SkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task

Razzhigaev A., Nikolay Arefyev, Panchenko A., , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).: Association for Computational Linguistics, 2021. P. 157–162.

In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple ...

Added: September 23, 2021

LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation

Davletov A., Nikolay Arefyev, Gordeev D. et al., , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).: Association for Computational Linguistics, 2021. P. 780–786.

This paper presents our approaches to SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation task. The first approach attempted to reformulate the task as a question answering problem, while the second one framed it as a binary classification problem. Our best system, which is an ensemble of XLM-R based binary classifiers trained with data augmentation, ...

Added: September 23, 2021

GlossReader at SemEval-2021 Task 2: Reading Definitions Improves Contextualized Word Embeddings

Rachinskiy Maxim, Arefyev Nikolay, , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).: Association for Computational Linguistics, 2021. P. 756–762.

Added: September 23, 2021

Automated Detection of Non-Relevant Posts on the Russian Imageboard "2ch": Importance of the Choice of Word Representations

Bakarov A., Gureenkova O., Lecture Notes in Computer Science 2018 P. 16–21

This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the ...

Added: December 12, 2020

The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics

Bakarov A., PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '18) 2018 P. 153–161

Swivel (Submatrix-WIse Vector Embedding Learner) is a distributional semantic model based on counting point-wise mutual information values, capable of capturing word-context co-occurrences in the PMI matrix that were not noted in the training corpus. This model outperforms mainstream word embedding training algorithms such as Continuous Bag-of-Words, GloVe and Skip-Gram in word similarity and word analogy ...

Added: December 12, 2020

A corpus study of semantic coherence in schizophrenia in Russian written texts

Panicheva P., Litvinova T., , in: The Fifth Saint Petersburg Winter Workshop on Experimental Studies of Speech and Language (Night Whites 2019).: St. Petersburg: Центр научно-информационных технологий "Астерион", 2019. P. 81–81.

Added: October 29, 2020

Semantic Coherence in Schizophrenia in Russian Written Texts

Panicheva P., Litvinova T., , in: Proceedings of the 25th Conference of Open Innovations Association FRUCT, University of Helsinki, Helsinki, Finland.: Helsinki: IEEE, 2019. P. 241–249.

Schizophrenia is widely known to manifest in language disturbance. Namely, speech incoherence, tangentiality, derailment are indicative of thought disorder characteristic of schizophrenia. Recent advances in distributional semantics have made it possible to measure coherence in text in a unified and objective manner. It has been shown that semantic coherence measures based on distributional semantic models ...

Added: October 29, 2020

Exploring Semantic Concreteness and Abstractness for Metaphor Identification and Beyond

Badryzlova Y., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)Вып. 19(26).: М.: Изд-во РГГУ, 2020. P. 33–47.

The paper presents a method for computing indexes of semantic concreteness and abstractness in two languages (Russian and English). These indexes are used in metaphor identification experiments in both languages; the results are either comparable to or surpass pervious work and the baselines. We analyze the obtained indexes of concreteness and abstractness to see how ...

Added: August 24, 2020

Типология лексики. Компьютерные методы и инструменты

Ryzhova D., СПб.: Алетейя, 2020.

Лексическая типология – область лингвистики, которая занимается сопоставительным анализом значений слов в разных языках, – на сегодняшний день добилась больших успехов: разработаны методики сбора и анализа материала, описан целый ряд семантических полей. Однако некоторые методологические ограничения по-прежнему не преодолены: процесс сбора данных очень трудоемок, что сказывается либо на объемах и представительности языковых выборок, либо на ...

Added: June 2, 2020

Authorship Attribution in Russian in Real-World Forensics Scenario

Panicheva P., Litvinova T., , in: Statistical Language and Speech Processing. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 11816 LNAIVol. 11816: Statistical Language and Speech Processing 7th International Conference, SLSP 2019, Ljubljana, Slovenia, October 14–16, 2019, Proceedings.: Springer Publishing Company, 2019. P. 299–310.

Recent demands in authorship attribution, specifically, cross-topic authorship attribution with small numbers of training samples and very short texts, impose new challenges on corpora design, feature and algorithm development. In the current work we address these challenges by performing authorship attribution on a specifically designed dataset in Russian. We present a dataset of short written ...

Added: October 28, 2019

Computer and metaphor: when lexicon, morphology, punctuation, and other beasts fail to predict sentence metaphoricity

Badryzlova Y., Lyashevskaya O., Panicheva P., , in: Когнитивные исследования языка. Вып. XXXVII: Интегративные процессы в когнитивной лингвистике: материалы международного конгресса по когнитивной лингвистикеТ. XXXVII: Интегративные процессы в когнитивной лингвистике: материалы международного конгресса по когнитивной лингвистике.: Деком, 2019. Ch. IV P. 609–615.

The paper provides linguistic explanations to the results of the supervised machine learning experiments for identification of verbal metaphor in Russian texts. We look at the classification accuracy of models based on different features (distributional semantics and lexical and morphosyntactic co-occurrence, etc.) and explore the behavior of verb constructions and wider context in order to investigate the reasons behind the ...

Added: October 23, 2019

Семантический компонент ‘комплетив’ и «полексемный» подход к акциональной классификации в русском языке

Fedotov M., Вопросы языкознания 2019 № 3 С. 7–44

The paper discusses two related aspectological topics. First section examines the ‘completive’ — i. e. ‘attainment of the internal limit’ — meaning (together with its counterpart ‘incompletive’, i. e. ‘non-attainment of the internal limit’). Its localization in the semantic structure of the utterance is determined: between aspect proper and actionality proper. Also, ‘completive’ can be included under ...

Added: September 28, 2019

Automatic construction of lexical typological questionnaires

Paperno D., Ryzhova D., , in: Methodological Tools for Linguistic Description and TypologyIssue 16.: University of Hawaii Press, 2019. Ch. 5 P. 45–61.

Questionnaires constitute a crucial tool in linguistic typology and language description. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in its turn based on the ...

Added: August 30, 2019

Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

Panicheva P., Protopopova E., Bukia G. et al., , in: Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information ScienceVol. 661.: Switzerland: Springer, 2017. P. 236–247.

In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength between Russian nouns and adjectives. A dataset of nouns and associated adjectives is used as the test set for pseudodisambiguation task. Models are trained with corpora of Russian fiction. A ...

Added: February 18, 2019

Semantic Feature Aggregation for Gender Identification in Russian Facebook

Panicheva P., Mirzagitova A., Ledovaya Y., , in: Artificial Intelligence and Natural Language, 6th Conference, AINL 2017, St. Petersburg, Russia, September 20–23, 2017, Revised Selected PapersIssue 789.: Switzerland: Springer, 2018. Ch. 1 P. 3–15.

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling ...

Added: February 18, 2019

Dark personalities on Facebook: Harmful online behaviors and language

Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...

Added: February 18, 2019