TextAnalyst Technology for Automatic Semantic Analysis of Text

?

TextAnalyst Technology for Automatic Semantic Analysis of Text

P. 156–167.

In book

Neuroinformatics and Semantic Representations: Theory and Applications

Cambridge Scholars Publishing, 2020.

Дискриминативная лемматизация сокращений в эпоху LLM

Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155

This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...

Added: March 10, 2026

Transformer-based approaches for lemmatizing abbreviations in Russian texts

Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47

This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...

Added: March 10, 2026

Эмоциональный анализ постов в ВКонтакте: классификатор или регрессор

Kolmogorova A., Калинин А. А., В кн.: Компьютерная лингвистика и интеллектуальные технологии: по материалам международной конференции «Диалог 2022», выпуск 21Вып. 21.: Изд-во РГГУ, 2022. С. 311–322.

The article summarizes the results of two tasks in machine learning paradigm: the task of classification according to the criterion of dominating emotion on the data of social networks posts in Russian and the regression task using the same data. The experiments are conducted on the data set collected from VKontakte social network and consisted of 3879 posts ...

Added: March 18, 2024

Machine learning approach for scientific and technical expertise

A. V. Belov, E. A. Egorova, Bulletin D. Serikbayev East Kazakhstan Technical University 2023 No. 4 P. 92–102

When conducting scientific and technical expertise, it is necessary to analyze the texts of reports on scientific research work. The analysis is carried out in order to determine whether the research being conducted belongs to the class of scientific research and development work in the field of IT. This article discusses the tasks of binary ...

Added: March 9, 2024

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Near-Zero-Shot Suggestion Mining with a Little Help from WordNet

Alekseev A., Tutubalina E., Kwon S. et al., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 23–36.

In this work we explore the constructive side of online reviews: advice, tips, requests, and suggestions that users provide about goods, venues and other items of interest. To reduce training costs and annotation efforts needed to build a classifier for a specific label set, we present and evaluate several entailment-based zero-shot approaches to suggestion classification ...

Added: April 10, 2023

Selection of Pseudo-Annotated Data for Adverse Drug Reaction Classification Across Drug Groups

Alimova I., Tutubalina E., , in: Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers.: Cham: Springer, 2022. P. 37–44.

Automatic monitoring of adverse drug events (ADEs) or reactions (ADRs) is currently receiving significant attention from the biomedical community. In recent years, user-generated data on social media has become a valuable resource for this task. Neural models have achieved impressive performance on automatic text classification for ADR detection. Yet, training and evaluation of these methods ...

Added: April 10, 2023

Способы автоматизации сравнительного анализа текстов при выявлении признаков плагиата в экспертизах по делам о нарушении авторских и смежных прав

Белова П.Е., Юрислингвистика 2023 № 27(38) С. 94–98

Within the framework of linguistic expertise on cases of copyright and related rights infringement experts are increasingly faced with the challenge of comparing several texts and searching for full-text, partial and other (lexical, grammatical, semantic, etc.) coincidences in them, as well as determining the values of these coincidences. Comparing documents manually takes a lot of ...

Added: April 6, 2023

Использование BERT для классификации коротких научных текстов на русском языке

Кусакин И. К., Цурупа А. М., Алмакаев А. В. et al., В кн.: НТИ-2022. Научная информация в современном мире: глобальные вызовы и национальные приоритеты : материалы 10-ой научной конференции с международным участием, посвященной 70-летию ВИНИТИ РАН, Москва, 25–26 октября 2022 года.: М.: ВИНИТИ РАН, 2022. С. 103–109.

This work is devoted to the study of approaches for training BERT-based classifiers of scientific articles to implement the application with the adoption of the best models for use in the infrastructure of the VINITI RAS. For this purpose, the BERT linguistic model was trained on a specialized corpus of scientific texts for subsequent use ...

Added: January 31, 2023

Исследование методов машинного обучения для классификации научных текстов на русском языке

Кусакин И. К., Федорец О. В., Romanov A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2022 Т. 12 С. 6–9

This paper discusses modern approaches to natural language processing and appliance of artificial intelligence technologies in the task of classifying scientific texts in Russian. The report contains an analysis of implementations of text vectorization methods, a description of experiments with training various classifier models: from classical machine learning algorithms to neural network transformer architectures. ...

Added: January 31, 2023

Pulse of the Nation: Observable Subjective Well-Being in Russia Inferred from Social Network Odnoklassniki

Sergey Smetanin, Mathematics 2022 Vol. 10 No. 16 Article 2947

Policymakers and researchers worldwide are interested in measuring the subjective well-being (SWB) of populations. In recent years, new approaches to measuring SWB have begun to appear, using digital traces as the main source of information, and show potential to overcome the shortcomings of traditional survey-based methods. In this paper, we propose the formal model for ...

Added: August 15, 2022

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Smetanin S., PeerJ Computer Science 2022 No. 8 Article e1039

The Russian language is still not as well resourced as English, especially in the field of sentiment analysis of Twitter content. Though several sentiment analysis datasets of tweets in Russia exist, they all are either automatically annotated or manually annotated by one annotator. Thus, there is no inter-annotator agreement, or annotation may be focused on ...

Added: June 29, 2022

Using a Homogeneous Semantic Network to Classify the Results of Genetic Analysis

Kharlamov A. A., Kulikov A., , in: Neuroinformatics and Semantic Representations: Theory and Applications.: Cambridge Scholars Publishing, 2020. P. 219–231.

В работе показано использование механизма сравнения семантических сетей текстов в задаче диагностики заболеваний с использованием сигнальных сетей. Выявление степени пересечения семантических сетей текстов позволяет говорить о степени их смыслового подобия. Однородная семантическая сеть как множество узлов, связанных дугами, имеет численные характеристики – частоты появления слов, а также пар слов в тексте, которые перенормируются с использованием ...

Added: December 7, 2021